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Preface 


The Common Desktop Environment: | nternationalization Programmer's 
Guide provides information for internationalizating the desktop, enabling 
applications to support various languages and cultural conventions in a 
consistent user interface. 


Specifically, this guide: 
© Provides guidelines and hints for developers on how to write applications 
for worldwide distribution. 


© Provides an overall view of internationalization topics that span 
different layers within the desktop. 


* Provides pointers to reference and more detailed documentation. |n some 
cases, standard documentation is referenced. 


This guide is not intended to duplicate the existing reference or conceptual 
documentation but rather to provide guidelines and conventions on specific 
internationalization topics. This document focuses on internationalization 
topics and not on any specific component or layer in an open software 
environment. 


Who Should Use This Book 


This book provides various levels of information for the application 
programmer and developer and related fields. 


How This Book Is Organized 
Explanations of the contents of this book follow: 


Chapter 1, “Introduction to Internationalization,” provides an 
overview of internationalization and localizing within the desktop, 
including locales, fonts, drawing, inputting, interclient communication, and 
extracting user visual text. Information on the significance of 
internationalization standards is also provided. 


Chapter 2, “Internationalization and the Common Desktop 
Environment,” covers the set of topics that developers commonly need to 
consider when internationalizing their applications, including locale 
management, localized resources, font management, localized text tasks, 
interclient communication for localized text, and internationalized 
functions. 


Chapter 3, “Internationalization and Distributed Networks,” 
discusses topics related to handling encoded characters in distributed 
networks. Basic principles and examples for interclient interoperability are 
provided to guide developers in internationalized distributed environments. 


Chapter 4, “Motif Dependencies,” topics include internationalized 
applicaitons, locale management, localized text, international User 
Interface Language (UIL), and localized applications. 


Chapter 5, “Xt and Xlib Dependencies,” topics include locale 
management, localized text tasks, font set metrics, interclient 
communications conventions for localized text, and charset and font set 
encoding and registry information. 


Appendix A, “Message Guidelines,” is a set of guidelines for writing 
messages. 
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Related Public ations 


See the following documentation for additional information on topics 
presented in this book: 


ISO C: ISO/IEC 9899:1990, Programming Languages --- C (technically 
identical to ANS X3.159-1989, Programming Language C). 


ISOAEC 9945-1: 1990, (IEEE Standard 1003.1) Information Technology - 
Portable Operating System Interface (POSIX) - Part 1: System 
Application Program Interface (API) [C Language]. 


ISOAEC DIS 9945-2: 1992, (IEEE Standard 1003.2-Draft) Information 
Technology - Portable Operating System Interface (POSIX) - Part 2: Sha 
and Utilities. 


OSF/Motif 1.2: OSF Motif 1.2 Programmer’s Reference, Revision 1.2, 
Open Software Foundation, Prentice Hall, 1992, ISBN: 0-13-643115-1. 


Scheifler, W. R., X Window Systen, The Complete Reference to Xlib, 
Xprotocol, |CCCM, XLFD - X Version 11, Release 5, Digital Press, 1992, 
ISBN: 1-55558-088-2. 


X/Open: X/ Open CAE Specification System I nterface Definition, Issue 4, 
X/Open Company Ltd., 1992, ISBN: 1-872630-46-4. 

X/Open: X/ Open CAE Specification Commands and Utilities, Issue 4, 
X/Open Company Ltd., 1992, ISBN: 1-872630-48-0. 


X/Open: X/ Open CAE Specification Systen Interface and Headers, Issue 
4, X/Open Company Ltd., 1992, ISBN: 1-872630-47-2. 


X/Open: X/ Open I nternationalization Guide X/Open Company Ltd., 
1992, ISBN: 1-872630-20-0. 


ISO/IEC 10646-1:1993 (E): Information Technology - Universal Multi- 
Octet Coded Character Set (UCS). Part 1: Architecture and Basic 
Multilingual Plane 


Preface xvii 


WhatTypographic Changesand Symbols Mean 


xviii 


Table P-1 describes the type changes and symbols used in this book. 


TableP-1 Typographic Conventions 


Typeface 
or Symbol Meaning 


AaBbCc123 The names of commands, files, 
and directories; on-screen 
computer output 


AaBbCc123 =Command-line placeholder: 
replace with a real name or 
value 


AaBbCc123 Book titles, new words or 
terms, or words to be 


Example 


Edit your .login file. 
Use ls -a to list all files. 
system% You have mail. 


To delete a file, type rm 
filename. 


Read Chapter 6 in User’s Guide. 
These are called class options. 
You must be root to do this. 


systems 


systems 


emphasized 
Code samples are included in boxes and may display the following: 
& UNIX C shell prompt 
$ UNIX Bourne and Korn shell 
prompt 
# Superuser prompt, all shells 


system# 
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Intoductionto 
Intemationalization 1 


Internationalization is the designing of computer systems and applications 
for users around the world. Such users have different languages and may 
have different requirements for the functionality and user interface of the 
systems they operate. In spite of these differences, users want to be able to 
implement enterprise-wide applications that run at their sites worldwide. 
These applications must be able to interoperate across country boundaries, 
run on a variety of hardware configurations from multiple vendors, and be 
localized to meet local users’ needs. This open, distributed computing 
environment is the reasoning behind common open software environments. 
The internationalization technology identified within this specification 
provides these benefits to a global market. 


Overview of | nternationalization 2 
Locales 7 
Fonts, Font Sets, and Font Lists 8 
Text Drawing 12 
Input Methods 13 
Interclient Communications Conventions (ICCC) 19 


1 


Overview of Intemationalization 


Multiple environments may exist within a common open system for support 
of different national languages. Each of these national environments is 
called a locale, which considers the language, its characters, fonts, and the 
customs used to input and format data. The Common Desktop Environment 
is fully internationalized such that any application can run using any locale 
installed in the system. 


A locale defines the behavior of a program at run time according to the 
language and cultural conventions of a user’s geographical area. 
Throughout the system, locales affect the following: 


© Encoding and processing of text data 


© Identifying the language and encoding of resource files and their text 
values 


© Rendering and layout of text strings 
© |nterchanging text that is used for interclient text communication 


* Selecting the input method (which code set will be generated) and the 
processing of text data 


© Encoding and decoding for interclient text communication 
© Bitmap/icon files 
© Actions and file types 


e User Interface Definition (UID) files 


An internationalized application contains no code that is dependent on the 
user's locale, the characters needed to represent that locale, or any formats 
(such as date and currency) that the user expects to see and interact with. 
The desktop accomplishes this by separating language- and culture- 
dependent information from the application and saving it outside the 
application. 
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Figure 1-1 shows the kinds of information that should be external to an 
application to simplify internationalization. 


Any string to 

be displayed: 

e Menu Items 

e Help Text 

« Prompt 

- Labels Bitmaps 


Application 


Data 
Presentation 
Format 


Source Code 
Date 
Format 
Collation 
Order 


Currency 
Format 


Numeric 


Format 


Figurel-1 Information external to the application 


By keeping the language- and culture-dependent information separate from 
the application source code, the application does not need to be rewritten or 
recompiled to be marketed in different countries. Instead, the only 
requirement is for the external information to be localized to accommodate 
local language and customs. 


An internationalized application is also adaptable to the requirements of 
different native languages, local customs, and character-string encodings. 
The process of adapting the operation to a particular native language, local 
custom, or string encoding is called localization. A goal of 
internationalization is to permit localization without program source 
modifications or recompilation. 
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ee 


For a quick overview of internationalization, refer to X/ Open CAE 
Specification System I nterface Definition, Issue 4, X/Open Company Ltd., 
1992, ISBN: 1-872630-46-4. 


Cunent State of Intemationalization 


Previously, the industry supplied many variants of internationalization 
from proprietary functions to the new set of standard functions published 
by X/Open. Also, there have been different levels of enabling, such as 
simple ASCII support, Latin/European support, Asian multibyte support, 
and Arabic/H ebrew bidirectional support. 


The interfaces defined within the X/Open specification are capable of 
supporting a large set of languages and territories, including: 


Script Description 


Latin Language Americas, Eastern/Western European 


Greek Greece 

Turkish Turkey 

East Asia J apanese, Korean, and Chinese 
Indic Thai 

Bidirectional Arabic and Hebrew 


Furthermore, the goal of the Common Desktop Environment is that 
localization of these technologies (translation of messages and 
documentation and other adaptation for local needs) be done in a consistent 
way, so that a supported user anywhere in the world will find the same 
common localized environment from vendor to vendor. End users and 
administrators can expect a consistent set of localization features that 
provide a complete application environment for support of global software. 


Intemationalization Standards 


Through the work of many companies, the functionality of the 
internationalization application program interface has been standardized 
over time to include additional requirements and languages, particularly 
those of East Asia. This work has been centered primarily in the Portable 
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Operating System Interface for Computer Environments (POSIX) and 
X/Open specifications. The original X/Open specification was published in 
the second edition of the X/ Open Portability Guide (XPG2) and was based 
on the Native Language Support product released by Hewlett-Packard. The 
latest published X/Open internationalization standard is referred to as 
XPG4. 


It is important that each layer within the desktop use the proper set of 
standards interfaces defined for internationalization to ensure end users 
get a consistent, localized interface. The definition of a locale and the 
common open set of locale-dependent functions are based on the following 
specifications: 


© X Window System, The Complete Reference to Xlib, Xprotocol, |CCCM, 
XLFD - X Version, Reease 5, Digital Press, 1992, ISBN 1-55558-088-2. 


¢ ANSI/IEEE Standard Portable Operating System | nterface for Computer 
Environments, IEEE. 


© OSF™ Motif 1.2 Programmer’ Reference, Revision 1.2, Open Software 
Foundation, Prentice Hall, 1992, ISBN 0-13-643115-1. 


e X/Open CAE Specification Commands and Utilities, Issue 4, X/Open 
Company Ltd., 1992, ISBN 1-872630-48-0. 


Within this environment, software developers can expect to develop 
worldwide applications that are portable, can interoperate across 
distributed systems (even from different vendors), and can meet the diverse 
language and cultural requirements of multinational users supported by 
the desktop standard locales. 


Common Intemationalization System 


Figure 1-2 on page 6 shows a view of how internationalization is pervasive 
across a specific single-host system. The goal is that the applications 
(clients) are built to be shipped worldwide for the set of locales supported in 
the underlying system. Using standard interfaces improves access to global 
markets and minimizes the amount of localization work needed by 
application developers. In addition, country representatives can be ensured 
of consistent localization within systems adhering to the principles of the 
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desktop. 

Editors System Utilities Managers Applications 
Text Customization Window Manager Database 
Icon Printing File Manager Object Media 
Audio Terminal Emulator Session Manager DME 
Image a a i 

Text Text Title Name 


Client 
XmString 


v 
! Drawing 
O 


| Vendor Shell 


(Geometry Mgmt) 
XmFontList 


Xlib 


y 
Ideographic 


Interclient XIM Protocol ada 
ar Management 
| Others Communication Vendor 
Output Method Subsystem 


Input Method Subsystem 


GUI 


y Vv 


Vv 
ISO8859 
PC Codes 
EUC 
Others? 
Locale Subsystem 


| PC Code->88591 


Latin - ISO 
Latin - PC Codes 


|_88591-+PC Code 


SJIS->JS Bl 


Conversion Subsystem 


Input Method Engine 


Internationalization Framework 


Figure1-2 Common internationalized system 
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Locales 


Most single-display clients operate in a single locale that is determined at 
run time from the setting of the environment variable, which is usually 
SLANG or the xnlLanguage resource. Locale environment variables, such 
as LC_ALL, LC_CTYPE, and LANG, can be used to control the environment. 
See “Xt Locale Management” on page 104 for more information. 


The LC_CTYPE category of the locale is used by the environment to identify 
the locale-specific features used at run time. The fonts and input method 
loaded by the toolkit are determined by the LC_CTYPE category. 


Programs that are enabled for internationalization are expected to call the 
Xt Set LanguageProc () function (which calls setlocale() by default) to 
set the locale desired by the user. None of the libraries call the 

set locale () function to set the locale, so it is the responsibility of the 
application to call XtSetLanguageProc() with either a specific locale or 
some value loaded at run time. If applications are internationalized and do 
not use Xt Set LanguageProc(), obtain the locale name from one of the 
following prioritized sources to pass it to the setlocale() function: 


¢ A command-line option 
e A resource 
© The empty string (“”) 


The empty string makes the set locale() function use the $Lc_* and 
SLANG environment variables to determine locale settings. Specifically, 
setlocale (LC_ALL, ““) specifies that the locale should be checked and taken 
from environment variables in the order shown in Table 1-1 for the various 
locale categories. 


Table 1-1 Locale Categories 


Category Ist Env. Var. 2nd Env. Var. 3rd Env. Var. 
LC_CTYPE: LC_ALL LC_TYPE LANG 
LC_COLLATE: LC_ALL LC_COLLATE LANG 
LC_TIME: LC_ALL LC_TIME LANG 
LC_NUMERIC: LC_ALL LC_NUMERIC LANG 
LC_MONETARY:  LC_ALL LC_MONETARY LANG 
LC_MESSAGES:  LC_ALL LC_MESSAGES LANG 
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The toolkit already defines a standard command-line option (-lLang) anda 
resource (xnlLanguage). Also, the resource value can be set in the server 
RESOURCE_MANAGER, which may affect all clients that connect to that 
server. 


Fonts, Font Sets, and Font Lists 


All X clients use fonts for drawing text. The basic object used in drawing 
text is XFont Struct, which identifies the font that contains the images to 
be drawn. 


The desktop already supports fonts by way of the xFont Struct data 
structure defined by Xlib; yet, the encoding of the characters within the 
font must be known to an internationalized application. To communicate 
this information, the program expects that all fonts at the server are 
identified by an X Logical Font Description (XLFD) name. The XLFD name 
enables users to describe both the base characteristics and the charset 
(encoding of font glyphs). The term charset is used to denote the encoding of 
glyphs within the font, while the term code set means the encoding of 
characters within the locale. The charset for a given font is determined by 
the CharSetRegistry and CharSetE ncoding fields of the XLFD name. Text 
and symbols are drawn as defined by the codes in the fonts. 


A font set (for example, an xFontSet data structure defined by Xlib) is a 
collection of one or more fonts that enables all characters defined for a 
given locale to be drawn. I nternationalized applications may be required to 
draw text encoded in the code sets of the locale where the value of an 
encoded character is not identical to the glyph index. Additionally, multiple 
fonts may be required to render all characters of the locale using one or 
more fonts whose encodings may be different than the code set of the locale. 
Since both code sets and charsets may vary from locale to locale, the 
concept of a font set is introduced through xFontSet. 


While fonts are identified by their XLFD name, font sets are identified by a 
list of XLFD names. The list can consist of one or more XLFD names with 
the exception that only the base characteristics are significant; the 
encoding of the desired fonts is determined from the locale. Any charsets 
specified in the XLFD base name list are ignored and users need only 
concentrate on specifying the base characteristics, such as point size, style, 
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and weight. A font set is said to be locale sensitive and is used to draw text 
that is encoded in the code set of the locale. Internationalized applications 
should use font sets instead of font structs to render text data. 


A font list is a libXm Toolkit object that is a collection of one or more font 
list entries. Font sets can be specified within a font list. Each font list entry 
designates either a font or a font set and is tagged with a name. If there is 
no tag in a font list entry, a default tag (XmFONTLIST_DEFAULT_TAG) is 
used. The font list can be used with the xmSt ring functions found in the 
libXm Toolkit library. A font list enables drawing of compound strings that 
consist of one or more segments, each identified by a tag. This allows the 
drawing of strings with different base characteristics (for example, drawing 
a bold and italic string within one operation). Some non-xmSt ring-based 
widgets, such as XmText of the libXm library, use only one font list entry in 
the font list. Motif font lists use the suffix : (colon) to identify a font set 
within a font list. 


The user is generally asked to specify either a font list (which may contain 
either a font or font set) or a font set. In an internationalized environment, 
the user must be able to specify fonts that are independent of the code set 
because the specification can be used under various locales with different 
code sets than the character set (charset) of the font. Therefore, it is 
recommended that all font lists be specified with a font set. 


Font Spec if ation 


The font specification can be either an X Logical Function Description 
(XLFD) name or an alias for the XLFD name. For example, the following 
are valid font specifications for a 14-point font: 


-dt-application-medium-r-normal-serif-—*-—*-*-*-p-—*-iso8859-1 


OR 
-*-r—-*-14-*is08859-1 


Font Set Spec if ation 


The font set specification is a list of names (XLFD names or their aliases) 
and is sometimes called a base name list. All names are separated by 
commas, with any blank spaces before or after the comma being ignored. 
Pattern-matching (wildcard) characters can be specified to help shorten 
XLFD names. 
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Remember that a font set specification is determined by the locale that is 
running. For example, the ja_J P J apanese locale defines three fonts 
(character sets) necessary to display all of its characters; the following 
identifies the set of Gothic fonts needed. 


© Example of full XLFD name list: 


dt-mincho-medium-r-normal--14-*-*-m-*- jisx0201.1976-0, 


dt-mincho-medium-r-normal--28-*-*-*-m-*-jisx0208.1983-0: 


© Example of single XLFD pattern name: 


dt-—*-medium-—*-24-*-m-*: 


The preceding two cases can be used with a J apanese locale as long as fonts 
exist that match the base name list. 


Font List Spec ification 


A font list specification can consist of one or more entries, each of which can 
be either a font specification or a font set specification. 


Each entry can be tagged with a name that is used when drawing a 
compound string. The tags are application-defined and are usually names 
representing the expected style of font; for example, bold, italic, 
bigbold. A null tag is used to denote the default entry and is associated 
with the XmFONTLIST_DEFAULT_TAG identifier used in XmString 
functions. 


A font tag is identified when it is prefixed with an = (equal sign); for 
example, =bigbold (this matches the first font defined at the server). If an 
=is specified but there is no name following it, the specification is 
considered the default font list entry. 


A font set tag is identified when it is prefixed with a : (colon); for example, 
:bigbold (this matches the first server set of fonts that satisfy the locale). 
If a: is specified but no name is given, the specification is considered the 
default font list entry. Within a font list entry specification, a base name 
list is separated by ; (Semicolons) rather than by , (commas). 


Example Font List Spec ification 


For the Latin 1 locales, enter: 


—*-r—-*-14-*; ,\# default font list entry 
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—*-b-*-18-*:bigbold# Large Bold fonts 


Base FontName List Specification 


The base font name list is a list of base font names associated with a font 
set as defined by the locale. The base font names are in a comma-separated 
list and are assumed to be characters from the portable character set; 
otherwise, the result is undefined. Blank space immediately on either side 
of a separating comma is ignored. 


Use of XLFD font names permits international applications to obtain the 
fonts needed for a variety of locales from a single locale-independent base 
font name. The single base font name specifies a family of fonts whose 
members are encoded in the various charsets needed by the locales of 
interest. 


An XLFD base font name can explicitly name the font’s charset needed for 
the locale. This enables the user to specify an exact font for use with a 
charset required by a locale, fully controlling the font selection. 


If a base font name is not an XLFD name, an attempt is made to obtain an 
XLFD name from the font properties for the font. 


The following algorithm is used to select the fonts that are used to display 
text with font sets. 


For each charset required by the locale, the base font name list is searched 
for the first of the following cases that names a set of fonts that exist at the 
server. 


© The first XLFD-conforming base font name that specifies the required 
charset or a superset of the required charset in its CharSetRegistry and 
CharSetE ncoding fields. 


© The first set of one or more XLFD-conforming base font names that 
specify one or more charsets that can be remapped to support the 
required charset. The Xlib implementation can recognize various 
mappings from a required charset to one or more other charsets and use 
the fonts for those charsets. For example, J |S Roman is ASCII with the 
~ (tilde) and \ (backslash) characters replaced by the yen and overbar 
characters; Xlib can load an |S08859-1 font to support this character set 
if aJ 1S Roman font is not available. 
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© The first XLFD-conforming font name, or the first non-XLFD font name 
for which an XLFD font name can be obtained, combined with the 
required charset (replacing the CharSetRegistry and CharSetE ncoding 
fields in the XLFD font name). In the first instance, the implementation 
can use a charset that is a superset of the required charset. 


© The first font name that can be mapped in some locale-dependent 
manner to one or more fonts that support imaging text in the charset. 


For example, assume a locale requires the following charsets: 


© 1SO08859-1 

© J 1SX0208.1983 

¢ J 1SX0201.1976 

® GB2312-1980.0 

You can supply a base font name list that explicitly specifies the charsets, 
ensuring that specific fonts are used if they exist, as shown in the following 
example: 


“—dt-mincho-Medium—-R-Normal-*-*-*-*-*-M-*-JISX0208.1983-0, \ 
dt-mincho-Medium-—R-Normal-*-*-*-*-*-M- \ 

*-JISX0201. 3isx0201\.1976-1,\ 
dt-song-Medium-R-Normal-—*-*-*-—*-—*-M-—*-GB2312-1980.0,\ 
—*-default—Bold-R-Normal-—*-—*-—*-—*-M-*-IS08859-1" 


You can supply a base font name list that omits the charsets, which selects 
fonts for each required code set, as shown in the following example: 


“-dt-Fixed-Medium-—R-Normal-*-*-—*-—*-—*-M-*, \ 
dt-Fixed-Medium-—R-Normal-*-*-*-—*-*-M-*, \ 
dt-Fixed-Medium-—R-Normal-*-*-*-*-*-M-*, \ 

-*-Courier-Bold-R-Normal-*-*-—*-—*-M-*” 


Alternatively, the user can supply a single base font name that selects from 
all available fonts that meet certain minimum XLFD property 
requirements, as shown in the following example: 


Wok k-* R Normal kK kK KKK M x” 


The desktop provides various functions for rendering localized text, 
including simple text, compound strings, and some widgets. These include 
functions within the Xlib and Motif libraries. 
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Input Methods 


The Common Desktop Environment provides the ability to enter localized 
input for an internationalized application that is using the Xm Toolkit. 
Specifically, the XmText [Field] widgets are enabled to interface with 
input methods provided by each locale. In addition, the dtterm client is 
enabled to use input methods. 


By default, each internationalization client that uses the libXm Toolkit uses 
the input method associated with a locale specified by the user. The 
XmNinputMethod resource is provided as a modifier on the locale name to 
allow a user to specify any alternative input method. 


The user interface of the input method consists of several elements. The 
need for these areas is dependent on the input method being used. They are 
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usually needed by input methods that require complex input processing 
and dialogs. See Figure 1-3 for an illustration of these areas. 


Label widget 
Ea 
MainWindow Aux | dare ido 
~ preedit Area 
Application=ApplicationShell ; 
area widget Text widget 
(VendorShell) 
Status 


BR 
RAE lal 
Auxiliary 
(ZENKOUHO) 


Figure1-3 Example of VendorShell widget with auxiliary (J apanese) 
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Preedit Area 


A preedit area is used to display the string being preedited. The input 
method supports four modes of preediting: Offf heSpot, OverTheSpot 
(default), Root, and None. 


Note - A string that has been committed cannot be reconverted. The status 
of the string is moved from the preedit area to the location where the user 
is entering characters.. 


OfffheSpot 


In OfffheSpot mode preediting using an input method, the location of 
preediting is fixed at just below the MainWindow area and on the right side 
of the status area as shown in Figure 1-4. A J apanese input method is used 
for the example. 
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Figure1-4 Example of OfffheSpot preediting with the VendorShell widget 
(J apanese) 


In the system environment, when preediting using an input method, the 
preedit string being preedited may be highlighted in some form depending 
on the input method. 


To use OfffheSpot mode, set the XmNpreeditType resource of the 
VendorShell widget either with the xtSetValues() function or with a 
resource file. The XxmNpreeditType resource can also be set as the resource 
of a TopLevelShell, ApplicationShell, of DialogShe1l1 widget, all of 
which are subclasses of the VendorShell1 widget class. 


OvertheSpot (Default) 


In OverTheSpot mode, the location of the preedit area is set to where the 
user is trying to enter characters (for example, the insert cursor position of 
the Text widget that has the current focus). The characters in a preedit 
area are displayed at the cursor position as an overlay window, and they 
can be highlighted depending on the input method. 
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Although a preedit area may consist of multiple lines in OverTheSpot 
mode. The preedit area is always within the MainWindow area and cannot 
cross its edges in any direction. 


Keep in mind that although the preEdit string under construction may be 
displayed as though it were part of the Text widget’s text, it is not passed 
to the client and displayed in the underlying edit screen until preedit ends. 
See Figure 1-5 on page 17 for an illustration. 


To use OverTheSpot mode explicitly, set the xmNpreeditType resource of 
the VendorShel1 widget either with the xtSetValues () function or with 
a resource file. The XxmNpreedit Type resource can be set as the resource of 
a TopLevelShell, ApplicationShell, or DialogShell widget because 
these are subclasses of the VendorShe11 widget class. 


are, 
SS RIROAEICE. Free 
Pass. | 


FnBzane so 


Figure1-5 Example of OverTheSpot preediting with the VendorShell widget 
(J apanese) 
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Status Area 


Root 


In Root mode, the preedit and status areas are located separate from the 
client’s window. The Root mode behavior is similar to Off—heSpot. See 
Figure 1-6 for an illustration. 


. ERRRRe 


Figure1-6 Example of Root preediting with the VendorShell widget (J apanese) 


A status area reports the input or keyboard status of the input method to 
the users. For OverTheSpot and OffTheSpot styles, the status area is 
located at the lower left corner of the VendorShell window. 


© If Root style, the status area is placed outside the client window. 


e If the preedit style is OfffheSpot mode, the preedit area is displayed to 
the right of the status area. 


The VendorShell widget provides geometry management so that a status 
area is rearranged at the bottom corner of the VendorShell window if the 
VendorShell window is resized. 
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Auxiliary Area 


An auxiliary area helps the user with preediting. Depending on the 
particular input method, an auxiliary area can be created. The J apanese 
input method in Figure 1-3 on page 14 creates the following types of 
auxiliary areas: 


e ZENKOUHO 

e JISNUMBER 

© Switching conversion method 
¢ SAKIYOMI-REN-BUNSETSU 
° IKKATSU-REN-BUNSETSU 
¢ TAN-BUNSETSU 
* FUKUGOU-GO 


MainWindow Area 


A MainWindow area is the widget used as the working area of the input 
method. In the system environment, the sole child of the VendorShell 
widget is the MainWindow widget. It can be any container widget, such as 
a RowColumn widget. The user creates the container widget as the child of 
the VendorShell1 widget. 


Focus Area 


A focus area is any descendant widget under the MainWindow widget 
subtree that currently has focus. The Motif application programmer using 
existing widgets does not need to worry about the focus area. The 
important information to remember is that only one widget can have input 
method processing at a time. The input method processing moves to the 
window (widget) that currently has the focus. 


Interc lientC ommunic ations Conventions (ICCC) 


The I nterclient Communications Conventions (ICCC) defines the 
mechanism used to pass text between clients. Because the system is 
capable of supporting multiple code sets, it may be possible that two 
applications that are communicating with each other are using different 
code sets. |CCC defines how these two clients agree on how the data is 
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passed between them. If two clients have incompatible character sets (for 
example, Latin1 and J apanese (J 1S)), some data may be lost when 
characters are transported. 


However, if two clients have different code sets but compatible character 
sets, |CCC enables these clients to pass information with no data lost. If 
code sets of the two clients are not identical, CompoundText encoding is 
used as the interchange with the coMPOUND_TEXT atom used. If data being 
communicated involves only portable characters (7-bit, ASCII, and others) 
or the |SO08859-1 code set, the data is communicated as is with no 
conversion by way of the XA_STRING atom. 


Titles and icon names need to be communicated to the Window Manager 
using the COMPOUND_TEXT atom if nonportable characters are used; 
otherwise, the XA_STRING atom can be used. Any other encoding is limited 
to the ability to convert to the locale of the Window Manager. The Window 
Manager runs in a single locale and supports only titles and icon names 
that are convertible to the code set of the locale under which it is running. 


The libXm library and all desktop clients should follow these conventions. 
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Locale Management 


Intemationalization and the 
Common Desktop Environment 2= 


Multiple environments may exist within a common open system for support 
of different national languages. Each of these national environments is 
called a locale, which considers the language, its characters, fonts, and the 
customs used to input and format data. The Common Desktop Environment 
is fully internationalized such that any application can run using any locale 
installed in the system. 


Locale Management 21 
Font Management 23 
Drawing Localized Text 31 
Inputting Localized Text 34 
Extracting Localized Text 41 
Localized Resources 45 
Operating System I nternationalized Functions 52 


For the desktop, most single-display clients operate in a single locale that 
is determined at run time from the setting of the environment variable, 
which is usually $L ANG. The Xm library (libXm) can only support a single 


21 


22 


locale that is used at the time each widget is instantiated. Changing the 
locale after the Xm library has been initialized may cause unpredictable 
behavior. 


All internationalized programs should set the locale desired by the user as 
defined in the locale environment variables. For programs using the 
desktop toolkit, the programs call the xt Set LanguageProc () function 
prior to calling any toolkit initialization function; for example, 
XtAppInitialize(). This function does all of the initialization necessary 
prior to the toolkit initialization. For nondesktop programs, the programs 
call the setlocale() function to set the locale desired by the user at the 
beginning of the program. 


Locale environment variables (for example, LC_ALL, LC_CTYPE, and LANG) 
are used to control the environment. Users should be aware that the 
LC_CTYPE category of the locale is used by the X and Xm libraries to 
identify the locale-specific features used at run time. Yet, the LC_MESSAGES 
category is used by the message catalog services to load locale-specific text. 
Refer to “Extracting Localized Text” on page 41 for more information. 
Specifically, the fonts and input method loaded by the toolkit are 
determined by the setting of the LC_CTYPE category. 


String encoding (for example, |!SO8859-1 or Extended UNIX Code (EUC), in 
an application's source code, resource files, and User Interface Language 
(UIL) files) should be the same as the code set of the locale where the 
application runs. If not, code conversion is required. 


All components are shipped as a single, worldwide executable and are 
required to support the R5 sample implementation set of locales: 
US, Western/Eastern Europe, J] apan, Korea, China, and Taiwan. 


Applications should be written so that they are code-set-independent and 
include support for any multibyte code set. 


The following are the functions used for locale management: 


°® XtSetLanguageProc() 

® setlocale() 

* XSupportsLocale() 

°® XSetLocaleModifiers () 
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FontManagement 


When rendering text in an X Windows™ client, at least two aspects are 
sensitive to internationalization: 


* Obtaining the localized text itself 


© Selecting the one or more fonts that contain all the glyphs needed to 
render the characters in the localized text. 


“Extracting Localized Text” on page 41 describes how to choose the correct 
fonts to render localized text. 


Matc hing Fonts to CharacterSets 


A font contains a set of glyphs used to render the characters of a locale. 
However, you may also want to do the following for a given locale: 


® Determine the fonts needed 

® Specify the necessary fonts 

© Determine the charset of a font in a resource file 

* Choose multiple fonts per locale 

The last two fields of a font XFLD identify which glyphs are contained in a 
font and which value is used to obtain a specific glyph from the set. These 
last two fields identify the encoding of the glyphs contained in the font. 


For example: 
-adobe-courier-—medium-r-normal--24-240-75-75-m-150-iso08859-1 


The last two fields of this XLFD name are iso8859 and 1. These fields 
specify that the |SO8859-1 standard glyphs are contained in the font. 
Further, it specifies that the character code values of the 1|SO8859-1 
standard are used to index the corresponding glyph for each character. 


The font charset used by the application to render data depends on the 
locale you select. Because the font charset of the data changes is based on 
the choice of locale, the font specification must not be hardcoded by the 
application. Instead, it should be placed in a locale-specific app-defaults 
file, allowing localized versions of the app-defaults file to be created. 
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Further, the font should be specified as a fontset. A fontset is an Xlib 
concept in which an XLFD is used to specify the fonts. The font charset 
fields of the XLFD are specified by the Xlib code that creates the fontset 
and fills in these fields based on the locale that the user has specified. 


For many languages (such as J apanese, Chinese, and Korean), multiple 
font charsets are combined to support single encoding. In these cases, 
multiple fonts must be opened to render the character data. Further, the 
data must be parsed into segments that correspond to each font, and in 
some cases, these segments must be transformed to convert the character 
values into glyphs indexes. The xFontset, which is a collection of all fonts 
necessary to render character data in a given locale, also deals with this set 
of problems. Further, a set of rendering and metric routines are provided 
that internally take care of breaking strings into character-set-consistent 
segments and transforming values into glyph indexes. These routines 
relieve the burden of the application developer, who needs only the user 
fontsets and the new X11R5 rendering and metric application program 
interfaces (APIs). 


FontObjects 


This section describes the following font objects: 


© Font sets 
© Fonts 
¢ Font lists 


Font Sets 


Generally, all internationalized programs expecting to draw localized text 
using Xlib are required to use an XmFont Set for specifying the locale 
dependent fonts. Specific fonts within a font set should be specified using 
XLFD naming conventions without the charset field specified. The resource 
name for an XFontset iS *fontSet. Refer to “Localized Resources” on 
page 45 for a list of font resources. 


Applications directly using Xlib to render text (as opposed to using 
XmString functions or widgets) may take advantage of the string-to- 
fontSet converter provided by Xt. For example, the following code fragment 
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shows how to obtain a fontset when using Xt and when not using Xt: 


/* pardon the double negative... means "If using Xt..." */ 
#ifndef NO_XT 
typedef struct { 
XFontSet fontset; 
char *foo; 

} ApplicationData, *ApplicationDataPtr; 
static XtResource my_resources[] = { 

{ XtNfontSet, XtCFontSet, XtRFontSet, sizeof (XFontSet), 

XtOffset (ApplicationDataPtr, fontset), XtRString, 

be Bs receatiile oe 

#endif /* NO_XT */ 


#ifdef NO_XT 
fontset = XCreateFontSet (dpy, "*-18-*", &missing_charsets, 
énum_missing_charsets. &default_string) ; 
if (num_missing_charsets > 0) { 
(void) fprintf(stderr, "&s: missing charsets.\n", 
program_name) ; 
XFreeStringList (missing_charsets) ; 


} 

#else 

XtGetApplicationResources (toplevel, &data, my_resources, 
XtNumber (my_resources), NULL, 0); 

fontset = data.fontset; 

#endif /* NO_XT */ 


Fonts 


Internationalized programs should avoid using fonts directly, that is, 
XFont Struct, unless they are being used for a specific charset and a 
specific character set. Use of XFontStruct may be limiting if the server 
you are connecting to does not support the specific charsets needed by a 
locale. The resource name for an XFont Struct iS *font. 


Font Lists 


All programs using widgets or xmSt ring to draw localized text are 
required to specify an xFont List name for specifying fonts. A font listis a 
list of one or more fontsets or fonts, or both. It is used to convey the list of 
fonts and fontsets a widget should use to render text. For more complicated 
applications, a font list may specify multiple font sets with each font set 
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being tagged with a name; for example, Bold, Large, Small, and so on. The 
tags are to be associated with a tag of an XxmSt ring segment. A tag may be 
used to identify a specific font or fontset within a font list. 

Font Setand Font List Syntax 


Table 2-1 shows the syntax for a font set and font list. 


Table 2-1 Font Set and Font List Syntax 


Resource Type XLFD Separator Terminator FontEntry Separator 


*fontSet: comma None None 
(Xlib) 

*fontList: semicolon colon comma 
(Motif) 


Here are some examples of font resource specifications: 


app_foo*fontList: -adobe-courier-medium-r-normal--24-240-75-75-m-\ 
150-*: 


The preceding fontList specifies a fontset, consisting of one or more 24- 
point Adobe Courier fonts, as appropriate for the user’s locale. 


app_foo*fontList: -adobe-courier-medium-r-normal--18-*; *-gothic-\ 
SeLeo et: 


This fontList specifies a fontset consisting of an 18-point Courier font (if 
available) for some characters in the users data, and an 18-point Gothic 
font for the others. 


Motif-based applications sometimes need direct access to the font set 
contained in a font list. For example, an application that uses a 
DrawingArea widget may want to label one of the images drawn there. 
The following sample code shows how to extract a font set from a font list. 
In this example, the tag XmFONTLIST_DEFAULT_TAG looks for the font set 
because this is the tag that says “codeset of the locale.” Applications should 
use the tag XmFONTLIST_DEFAULT_TAG for any string that could contain 
localized data. 


CDE: Internationalization Programmer’s Guide 


NO 
lll 


XFontSet FontList2FontSet ( XmFontList fontlist) 
{ 

XmFontContext context; 

XmFontListEntry next_entry; 

XmFontType type_return = XmFONT_IS_FONT; 

char* font_tag; 

XFontSet fontset; 

XFontSet first_fontset; 


Boolean have_font_set = False; 

if ( !XmFontListInitFontContext (&context, fontlist)) { 
XtWarning(“fl2fs: can’t create fontlist context...”); 
exit 0; 

} 

while ((next_entry = XmFontListNextEntry (context) != NULL) { 
fontset = (XFontSet) XmFontListEntryGetFont (next_entry, 

&type_return); 

if (type_return == XmFONT_IS_FONTSET ) { 


font_tag = XmFontListEntryGetTag(next_entry) ; 


if (!strcmp (XmFONTLIST_DEFAULT_TAG, font_tag) { 
return fontset; 


} 


/* Remember the lst fontset, just in case... */ 
if ('have_font_set) { 

first_fontset = fontset; 

have_font_set = True; 


} 
if (have_font_set) 
return first_fontset; 
return (XFontSet) NULL; 
} 


Font Functions 


The following Xlib font management API functions are available: 


® xXCreateFontSet () 
® XLocaleOfFontSet () 
°® XFontsOfFontSet () 
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°® XBaseFontNameListOfFontSet () 
°® XFreeFontSet () 


The following Motif FontListAPI functions are available: 


XmFontListEntryCreate () 
XmFontListEntryAppend () 
XmFontListEntryFree () 
XmFontListEntryGetTag () 
XmFontListEntryGetFont () 
XmFontListEntryLoad() 


FontC harsets 


To improve basic interchange, fonts are organized according to the standard 
X-Consortium font charsets. 


Default Font Set Per Language Group 


Selecting base font names of a font set associated with a developer's 
language is usually easy because the developer is familiar with the 
language and the set of fonts needed. 


Yet, when selecting the base font names of a font set for various locales, 
this task can be difficult because an XLFD font specification consists of 15 
fields. For localized usage, the following fields are critical for selecting font 
sets: 


© FAMILY _NAME %F 
¢ WEIGHT_NAME %W 
¢ SLANT %S 

¢ ADD_STYLE %A 

¢ SPACING %SP 


This simplifies the number of fields, yet the possible values for each of 
these fields may vary per locale. The actual point size (POINT SIZE) may 
vary across platforms. 


Throughout this documentation, the following convention should be used 
when specifying localized fonts: 
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dt-#F-%2w-%S—-normal-%A-—*—-*-*-%SP-* 


The following describes the minimum set of recommended values for each 
field to be used within the desktop for the critical fields when specifying 
font sets in resource (app-defaults) files. 


Latin 1ISO8859-1 Fonts 

FOUNDRY ‘dt’ 

FAMILY_NAME ‘interface user’ 
‘interface system’ 
‘application’ 

WEIGHT NAME medium or bold 

SLANT r ori 

ADD_STYLE sans or serif 

SPACING porm 

Other 1SO8859 Fonts 


The same values defined for |SO8859-1 are recommended. 


JIS} apanese Font 

FOUNDRY ‘dt’ 
FAMILY_NAME Gothic or Mincho 
WEIGHT NAME medium or bold 
SLANT r 

ADD_STYLE 

SPACING m 


KSC Korean Font 
FOUNDRY ‘dt’ 
FAMILY_NAME Totum or Pathang 
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WEIGHT_NAME medium or bold 


SLANT r 
ADD_STYLE i 
SPACING m 


Note - The FAMILY NAME values may change depending on the official 
romanization of the two common font families in use. As background, 
Totum corresponds to fonts typically shipped as Gothic, Kodig, or Dotum; 
Pathang corresponds to fonts typically shipped as Myungo or Myeongjo. 


CNS Traditional Chinese Font 


FOUNDRY ‘dt’ 
FAMILY_NAME Sung and Kai 
WEIGHT NAME medium or bold 
SLANT r 

ADD_STYLE = 

SPACING m 

GB Simplified Chinese Font 
FOUNDRY ‘dt’ 
FAMILY_NAME Song and Kai 
WEIGHT NAME medium or bold 
SLANT r 

ADD_STYLE f 

SPACING m 
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Drawing Localized Text 


Simple Text 


There are several mechanisms provided to render a localized string, 
depending on the Motif or Xlib library being used. The following discusses 
the interfaces that are recommended for internationalized applications. Yet, 
it is recommended that all localized data be externalized from the program 
using the simple text. 


The following Xlib multibyte (char*) drawing functions are available for 
internationalization: 


° XmbDrawImageString() 
°® XmbDrawString() 
© XmbDrawText () 


The following Xlib wide character (wchar_t*) drawing functions are 
available for internationalization: 


°® XwcDrawImageString() 
® XwcDrawString() 
© XwcDrawText () 


The following Xlib multibyte (char*) font metric functions are available for 
internationalization: 


XExtentsOfFontSet () 
XmbTextEscapement () 
XmbTextExtents () 
XmbTextPerCharExtents 


The following Xlib wide character (char_t*) font metric functions are 
available for internationalization: 


XExtentsOfFontSet () 
XwcTextEscapement () 
XwcTextExtents () 
XwcTextPerCharExtents 
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XmShting (Compound Sting) 


For the Xm library, localized text should be inserted into XmString 
segments using XmStringCreateLocalized(). The tag associated with 
localized text is XmFONTLIST_DEFAULT_TAG, which is used to match an 
entry in a font list. Applications that mix several fonts within a compound 
string using XmStringCreate() should use XmFONTLIST_DEFAULT_TAG 
as the tag for any localized string. 


More importantly, for interclient communications, the 
XmStringConvertToCT() function associates a segment tagged as 

XmF ONTLIST_DEFAULT_TAG as being encoded in the code set of the locale. 
Otherwise, depending on the tag name used, the Xm library may not be 
able to properly identify the encoding on interclient communications for 
text data. 


A localized string segment inside an XmSt ring can be drawn with a font 
list having a font set with xmFONTLIST_DEFAULT_TAG. Use of a localized 
string is recommended for portability. 


The following is an example of creating a font list for drawing a localized 
string: 

XmFontList CreateFontList( Display* dpy, char* pattern) 

} 


SmFontListEntry font_entry; 

XmFontList fontlist; 

font_entry = XmFontListEntryLoad( dpy, pattern, 
XmFONT_IS_FONTSET, 
XmFONTLIST_DEFAULT_TAG) ; 


fontlist = XmFontListAppendEntry (NULL, font_entry) ; 
/* XmFontListEntryFree(font_entry); */ 


if ( fontlist == NULL ) { 
XtWarning(“fl2fs: can’t create fontlist...”); 
exit (0); 


return fontlist; 


} 


int main(argc, argv) 
int argc; 
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char **argv; 

} 
Display *dpy; /* Display Ae 
XtAppContext app_context;/* Application Context */ 


XmFontList fontlist; 
XmFontSet fontset; 
XFontStruct** fontstructs; 
char** fontnames; 

int i,n; 


char *progrname; /* program name without the full pathname */ 


if (progname=strrchr(argv[0], ‘/’)){ 
progname++; 

} 

else f{ 
progname = argv[0]; 


/* Initialize toolkit and open display. 
af 

XtSetLanguageProc (NULL, NULL, NULL); 

XtToolkitInitialize(): 

app_context = XtCreateApplicationContext (); 

dpy = XtOpenDisplay (app_context, NULL, progname, “XMdemos”, 

NULL, 0, &argc, argv); 

if (!dpy) { 
XtWarning(“fl2fs: can’t open display, exiting...”); 
exit (0); 


fontlist = CreateFontList(dpy, argv[1l] ); 
fontset = FontList2FontSet( fontlist ); 
/* 
* Print out BaseFontNames of Fontset 
ey 
n = XFontsOfFontSet( fontset, &fontstructs, &fontnames) ; 


printf(“Fonts for %s is d\n”, argv[1], n); 


for (i = 0; i<nj; ++i ) printf(“font[%d} - Ss\n”, i,\ 
fontnames[i] ); 
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exit(1); 


} 


A localized string can be written in resource files because a compound 
string specified in resource files has a locale-encoded segment with 
Xm_FONTLIST_DEFAULT_TAG. For example, the fontList resource in the 
following example is automatically associated with 

XmF ONTLIST_DEFAULT_TAG. 


labelString:Japanese string 


*fontList:-dt-interface system-medium-r-normal-L*-—*-*-*-*—-*-*; 


The following set of XmString functions is recommend for 
internationalization: 


XmStringCreateLocalized () 

XmStringDraw () 

XmStringDrawImage () 

XmStringDrawUnderline () 

The following set of XmString functions is not recommend for 
internationalization because it takes a direction that may not work with 
languages not covered: 


° XmStringCreateLtoR () 
° XmStringSegementCreate() 


Inputting Loc alized Text 


Input for localized text is typically done by using either the local input 
method or the network-based input method. 


The local input method means that the input method is built in the Xlib. It 
is typically used for a language that can be composed using simple rules 
and that does not require language-specific features. The network-based 
input method means that the actual input method is provided as separate 
servers, and Xlib communicates with them through the XIM protocol to do 
the language-specific composition. 


Basic Prompts and Dialogs 


It is strongly recommended that applications use the Text widget to do all 
text input. 
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Input within a DrawingArea Widget 


Many applications do their own drawing within a widget based on input. To 
provide consistency within the desktop environment, xmIm functions are 
recommended because the style and geometry management needed for an 
input method is managed by the VendorShell widget class. The 
application need only worry about handling key events, focus, and 
communicating the current input location within the drawing area. Using 
these functions requires some basic knowledge of the underlying Xlib input 
method architecture, but a developer need only be concerned with the xmIm 
pieces of information. 


Application-Specific and Language-Specifi Intermediate Feedbacks 


Some applications may need to directly display intermediate feedback 
during preediting, such as when an application exceeds the functions 
supplied by Xlib. Examples of this include for PostScript™ rendering or 
using vertical writing. 


The core Xlib provides the common set of interfaces that allow an 
application to display intermediate feedback during preediting. By 
registering the application's callbacks and setting the preediting style to 
XNP reeditCallbacks, an application can get the intermediate preediting 
data from the input method and can draw whatever it needs. 


Applications intended to do sophisticated language processing may 
recognize extensions within a specific XIM implementation and its input 
method engines. Such applications are on the leading edge and will require 
familiarity with details of the XIM functions. 


Textand Textield Widget 


For basic prompts and dialogs, the Text or TextField widget is 
recommended. Besides resources, all of the xmText Field and XmText 
functions are available for getting and for setting localized text inside a 
Text [Field] widget. 
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Most XmText functions are based on the number of characters, not on the 
number of bytes. For example, all XmTextPosition() function positions 
are character positions, not byte positions. The xmTextGetMaxLength () 
function returns the number of bytes. When in doubt, remember that 
positions are always in character units. 


The width of a Text or TextField widget is determined by the resource 
value of XmNcolumns. But, this value means the number of the widest 
characters in the font set, not the number of bytes or columns. For 
example, suppose that you have selected a variable-width font for the Text 
widget. The character i may have a width of 1 pixel, while the character W 
may have a width of 7 pixels. When a value of 10 is set for XmNcolumns, 
this is considered a request to make the Text widget wide enough to be 
able to display at least 10 characters. So the Text widget must use the 
width of the widest character to determine the pixel width of its core 
widget. With this example, it may be able to display 10 W characters in the 
widget, or 70 i characters. This structure for XmNcolumns may cause 
problems in locales whose code set is a multibyte and a multicolumn 
encoding. As such, this value should be set within a localized resource. 


The following section identifies the set of functions available for 
applications that are used to manage input methods. For applications that 
use the Text and TextField widgets, refer to “Input Method (Keyboards)” 
on page 50. 


CharacterInput within Customized Widgets Not Using Text(Field] Widgets 


In some cases, an application may obtain character input from the user but 
does not use a TextField or Text widget to do so. For example, an 
application using a DrawingArea widget may allow the user to type in text 
directly into the DrawingArea. In this case, the application could use the 
Xlib XIM functions as described in later sections, or alternatively, the 
application may use the xmIm functions of Motif 1.2. The xmIm functions 
allow an application to connect to and interact with an input method with a 
minimum of code. Further, it allows the Motif vendorShel1 widget to take 
care of geometry management for the input method on the application's 
behalf. 
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Although the xmIm functions are shipped in all implementations of Motif 
1.2, the functions are not documented in Motif 1.2. OSF has announced its 
intention to augment and document the xmIm functions for Motif 2.0. The 
functions described here are the Motif 1.2 xmim functions. 


Note - The Motif 1.2 xmIm functions do not support preedit callback style 
or status callback style input methods. The preedit callback can be used by 
the Xlib API. For more information, see “XIM Management” on page 39. 


Following are the XmIm functions you can safely use in a Motif 1.2-based 
application. The formal description of the parameters and types can be 
found in the xm.h header file. 


Function Name Description 


XmImRegister () Performs xOpenIM() and queries the input method 
for supported styles. 


XmImSet Values () Negotiates and selects the preedit and status styles. 
XmImSetFocusValues () 


Creates the XIC, if one does not exist. Notifies the 
input method that the widget has gained the focus. 
Sets the values passed to the XIC. 


XmImUnsetFocus() Notifies the input method that the widget has lost 
the focus. 


XmImMbLookupString () 


Xm equivalent of XmbLookupString (); converts 
one or more key events into a character. Return 
value is identical to XmbLookupString(). 


XmImUnregister() Disconnects the input method and the widget, 
allowing connection to a new input method. Does not 
necessarily close the input method (implementation- 
dependent). 

The XmImSet Values () and XmImSetFocusValues () functions allow the 

application to pass information needed by the input method. It is important 

for the application to pass all values even though not all values are needed 

(for each supports preedit and status style). This is because the application 

can never be sure which style has been selected by the user or the 


Internationalization and the Common Desktop Environment 37 


VendorShel1 widget. Following are the arguments and data types of each 
value that should be passed in each call to the XmImSet [Focus] Values () 
function. 


Argument Name Data Type 


XmNbackground Pixel 

XmNforeground Pixel 

XmNbackgroundPixmap 
Pixmap 


XmNspotLocation  XPoint 


XmNfontList Motif fontlist 


XmNlineSpace int (pixel height between consecutive baselines) 
The xmIm functions are used in the following manner: 


Before initializing the toolkit, the application should call 
Xt SetLanguageProc (NULL, NULL, NULL) to initialize the locale. 


After creating the widget where character input is desired, the 
application should call XmImRegister (widget) to open the input 
method and establish a connection. 


After establishing a connection to the input method, the application 
should pass the initial XIC values to the input method by calling 
XmImSetValues() and passing all of the values listed above. This 
function takes an arg list and a number_args argument. The arglist is 
loaded by calling xtSetArg(). 


Add an event handler, through the xtAddEventHandler() function, for 
the manager widget of the widget obtaining input from the input 
method. The event handler is for the FocusChangeMask mask. The 
handler should call xmImSet FocusValues () when gaining focus and 
should call xmImUnsetFocus () when losing focus. When setting focus 
for the input method, pass the full set of values listed above. 


Add a DestroyCallback for the widget obtaining input from the input 
method. In the destroy callback, call xmImUnregister() to notify the 
input method that you are breaking the connection between the widget 
and the input method. 


CDE: Internationalization Programmer’s Guide 


2 


e Use XmImSetValues () tonotify the input method any time one or more 
of the input method values listed above change (for example, 


spotL ocation). 


XIM Management 


Following are the XIM management functions. 


Function Name 
XOpenIM () 


XCloseIM() 


XGet IMValues () 


XDisplayOfIM() 


XLocaleOfIM () 


XCreatelIC() 


XDestroyIC () 


XIMOFIC () 


XSetICValues () 


XGetICValues () 
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Description 
Establishes a connection to an input method. 


Removes a connection to an input method previously 
established with a call to xOpenIM(). 


Queries the input method for a list of properties. 
Currently, the only standard argument in Xlib is 
XNQueryInputStyle. 


Returns the display associated with an input 
method. 


Returns a string identifying the locale of the input 
method. There are no standard strings; the value 
returned by this call is implementation-defined. 


Creates an input context. The input context contains 
both the data required (if any) by an input method 
and the information required to display that data. 


Destroys an input context, freeing any associated 
memory. 


Returns the input method currently associated with 
a given input context. 


Passes zero or more values to an input context to 
control input of character data, or control display of 
preedit or status information. A table of all valid 
input context value arguments can be found in the 
X11R5 specification. 


Queries an input context to get zero or more input 
context values. A table of all valid input context 
value arguments can be found in the X11R5 
specification. 
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XIM Event Handling 


Following are the XIM event handling functions: 


Function Name 
XmbLookupString () 
XwcLookupString () 


XmbResetIC () 


XwcResetIC () 


XFilterEvent () 


XSetICFocus () 


XUnsetICFocus () 


XIM Callback 


Description 
Converts keypress events into characters. 
Converts keypress events into wide characters. 


Resets an input context to its initial state. Any input 
pending on that context is deleted. Returns the 
current preedit value as a char* string. Depending 
on the implementation of the input method, the 
return value may be NULL. 


Resets an input context to its initial state. Any input 
pending on that context is deleted. Returns the 
current preedit value as a wchar_t* String. 


Allows the input method to process any incoming 
events to the clients before the application processes 
them. 


Notifies the input method that the focus window 
attached to the specified input context has received 
keyboard focus. 


Notifies the input method that the specified input 
context has lost the keyboard focus and that no more 
input is expected on the focus window attached to 
that context. 


X Input Methods (XIMs) provide three categories of callbacks. One is 
preedit callbacks, which allow applications to display the intermediate 
feedbacks during preediting. The second is geometry callbacks, which allow 
applications and XIM to negotiate the geometry to be used for XIM. The 
third is status callbacks, which allow applications to display the internal 


status of XIM. 
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Table2-2. XIM Callbacks 


‘ XIM Preedit Caret XIM Geometry 

XIM Preedit Callbacks XIM Status Callbacks Callbseke Callbades 
(*PreeditStart—- (*StatusStart— (*PreeditCaret-— (*GeometryCall- 
Callback) () Callback) () Callback) () back) () 
(*PreeditDone- (*StatusDone- 
Callback) () Callback) () 
(*PreeditDraw- (*StatusDraw- 
Callback) () Callback) () 

Extracting Loc alized Text 


Although there are different methods to localize an application, the general 
rule is that any language-dependent information is outside the application 
and is stored in separate directories identified by a locale name. 


This section describes how the user, the application developer, and the 
implementation combine to establish the language environment of the 
application. Two general approaches to localizing applications are also 
discussed. The following three methods can be used: 


© Resource files 
¢ Message catalogs 
e Private files 


Resource Files 


This is the GUI toolkit mechanism for customizing all sorts of information 
about an application. The Intrinsic library (libXt) provides a sophisticated 
mechanism for merging the command-line options, application-defined 
resources, and user-defined resources. Resource files can be used for 
extracting localized text. The difference between resource files and message 
catalogs is that the resource database is compiled each time it is loaded. As 
such, care should be taken when deciding which strings to place in resource 
files and which to place in message catalogs. 
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Also note that the Xm library functions do not depend on the 
LC_MESSAGE category when specifying the location from which localized 
resources are loaded. Refer to the xtSetLanguageProc() man page for 
more information. 


Message Catalogs 


Private Files 


Message Guidelines 


This is the traditional operating system mechanism for accessing external 
databases containing localized text. These functions load a precompiled 
catalog file that is ready to be accessed. They also provide defaults within 
the actual program for cases when no catalogs may be found. 


The messaging support is based on both the XPG4 and System V Release 4 
(SVR4) interfaces for accessing message catalogs. 


Private databases can be used by applications to provide generic, 
customized databases for more than just localization text. Usually, such 
databases do contain text. It is recommended that if the database is to be 
spread out over many files, some run-time indirect access of localized text 
be provided. Without this access, localization for the average user is a 
difficult effort. Generally, such private file formats are discouraged by 
groups doing localization. But problems are reduced if a tool is provided 
specifically for localization of text only. 


Message guidelines foster consistent formatting of message and help 
information. They also promote creation and maintenance of messages that 
can be easily understood by inexperienced English-speaking end users, as 
well as by inexperienced translators. Use these guidelines to create 
message files that are consistent in language and clear in meaning. 
Distribution of these guidelines enable programmers and writers to 
coordinate their message-writing efforts. Default messages, external 
message files, and planned delivery of translatable messages are required 
for each executable to fully implement international language support. 
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Message Extraction Functions 


One of the requirements of internationalizing programs (basic commands 
and utilities inclusive) is that the messages displayed on the output devices 
be in the language of the user. As these programs may be used in many 
countries (international locales), the messages must be translated into the 
various languages of these countries. 


There are two sets of message extraction functions in the desktop 
environment: XPG4 functions and Xlib functions. 


XPG4/ Universal UNIX Messaging Functions 


The XPG4 message facility consists of several components: message source 
files, catalog generation facilities, and programming interfaces. Following 
are the XPG4/Universal UNIX™ message functions: 


° catopen () 
® catgets() 
© catclose() 


XPG4 Messaging Examples 


There are three parts to this example which demonstrates how to retrieve a 
message from a catalog. The first part shows the message source file and 
the second part shows the method used to generate the catalog file. The 
third part shows an example program using this catalog. 


Message Source File 


The message catalog can be specified as follows: 

example.msg file: 

Squote “ 

S$ every message catalog should have a beginning set number. 
Sset 1 This is the set 1 of messages 

1 “Hello world\n” 

2 “Good Morning\n” 

3 “example: 1000.220 Read permission is denied for the file 
$s.\n™ 

Sset 2 

1 “Howdy\n” 
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Generation of Catalog File 
This file is input to the gencat utility to generate the message catalog 
example.cat as follows: 


gencat exampl xample.msg 


Accessing the Catalog in a Program 


#include <locale.h> 
#include <nl_types.h> 
char *MF_EXAMPLE = "example.cat" 


main () 


{ 
nl_catd catd; 
int error; 


(void) setlocale(LC_ALL, “”); 


catd = catopen(MF_EXAMPLE, 0); 
/* Get the message number 1 from the first set.*/ 


printf( catgets(catd,1,1,“Hello world\n”) ); 
/* Get the message number 1 from the second set.*/ 


printf( catgets(catd, 2, 1,“Howdy\n”) ); 
/* Display an error message.*/ 


printf( catgets(catd, 1, 4,‘“example: 100.220 
Permission is denied to read the file %s.\n™“) , 
MF_EXAMPLE) ; 

catclose(catd); 


Xlib Messaging Functions 


The following Xlib messaging functions provide a similar input/output (I/O) 
operation to the resources. 


XrmPutFileDatabase () 
XrmGetFileDatabase () 
XrmGet StringDatabase() 
XrmLocaleOfDatabase() 
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They are described in X Window System, The Complete Reference to Xlib, 
Xprotocol, |CCCM, XLFD - X Version 11, Release 5. 


Xlib Message and Resource Facilities 


Localized Resources 


Part of internationalizing a system environment, toolkit-based application 
is not having any locale-specific data hardcoded within the application 
source. One common locale-specific item is messages (error and warning) 
returned by the application of the standard I/O. 


In general, for any error or warning messages to be displayed to the user 
through a system environment toolkit widget or gadget, externalize the 
messages through message catalogs. 


For dialog messages to be displayed through a toolkit component, 
externalize the messages through localized resource files. This is done in 
the same way as localizing resources, such as the XmLabel and 
XmPushButton Classes’ XmNlabelSt ring resource or window titles. 


For example, if a warning message is to be displayed through an 
XmMessageBox widget class, the xmNmessageString resource cannot be 
hardcoded within the application source code. Instead, the value of this 
resource must be retrieved from a message catalog. For an 
internationalized application expected to run in different locales, a distinct 
localized catalog must exist for each of the locales to be supported. In this 
way, the application need not be rebuilt. 


Localized resource files can be put in the /usr/lib/X11/%L/appdefaults 
subdirectories, or they can be pointed to by the XENVIRONMENT 
environment variable. The %L variable is replaced with the name of the 
locale used at run time. 


This section describes which widget and gadget resources are locale- 
sensitive. The information is organized by related functionality. For 
example, the first section describes those resources that are locale-sensitive 
for widgets used to display labels or to provide push-button functionality. 
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Labels and Buttons 


Table 2-3 lists the localized resources that are used as labels. Many of them 
are of type XmString. The rest are of type color or char*. See the Motif 1.2 
Reference Manual for detailed descriptions of these resources. In each case, 
the application should not hardcode these resources. If resource values 
need to be specified by the application, it should be done with the app- 
defaults file, ensuring that the resource can be localized. 


Only the widget class resources are listed here; subclasses of these widgets 
are not listed. For example, the XmDrawnButton widget class does not 
introduce any new resources that are localized. However, it is a subclass of 
the XmLabelWidget widget class; therefore, its accelerator resource, 
acceleratorText resource, and so on, are also localized and should not be 
hardcoded by an application. 


Table 2-3 Localized Resources 


Widget Class Resource Name 

Core *background:? 
XmCommand *command: 

XmCommand *promptString: 
XmFileSelectionBox *dirListLabelString: 
XmFileSelectionBox *fileListLabelString: 
XmFileSelectionBox *filterLabelString: 
XmFileSelectionBox *noMatchString: 
XmLabel [Gadget ] *x*accelerator: 
XmLabel [Gadget ] *x*acceleratorText: 
XmLabel [Gadget ] *labelString: 
XmLabel [Gadget ] *mnemonic: 

XmList *stringDirection: 
XmManager *stringDirection: 
XmMessageBox *cancelLabelString: 
XmMessageBox *helpLabelString: 
XmMessageBox *messageString: 
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Table 2-3 Localized Resources (Continued) 


Widget Class Resource Name 
XmMessageBox *okLabelString: 
XmPrimitive *foreground:? 
XmRowColumn *labelString: 
XmRowColumn *menuAccelerator: 
XmRowColumn *mnemonic: 
XmRowColumn (SimpleMenu* ) *buttonAccelerators: 
XmRowColumn *mnemonic: 
XmRowColumn *mnemonic: 
XmRowColumn *mnemonic: 
XmRowColumn *mnemonic: 
XmSelectionBox *applyLabelString: 
XmSelectionBox *cancelLabelString: 
XmSelectionBox *helpLabelString: 
XmSelectionBox *listLabelString: 
XmSelectionBox *okLabelString: 
XmSelectionBox *selectionLabelString: 
XmSelectionBox *textAccelerators: 


1. The foreground and background colors are not localized due to restrictions in the X protocol that require 
color names to be limited to the portable character set. Localized color names are left to applications to 
provide a localized database to map to a name encoded with the portable character set. 


Note that the XmRowColumn widget has additional string resources that 
may be localized. These resources are listed in the XmRowColumn man page, 
under the heading “Simple Menu Creation Resource Set.” As the title 
implies, these resources affect only RowColumn widgets created with the 
XmCreateSimpleMenu() function. The resources affected are: 
*buttonAccelerators, *buttonAcceleratorText, 
*buttonMnemonics, *optionLabel, and *optionMnemonic. These 
resources are not included in Table 2-3 because they are rarely used and 
apply to RowColumn only when creating a simple menu. 


Internationalization and the Common Desktop Environment 47 


List Resources 


Tle 


Several widgets allow applications to set or read lists of items in the 
widget. Table 2-4 shows which widgets allow this and the resources they 
use to set or read these lists. Because the list items may need to be 
localized, do not hardcode these lists. Rather, they should be set as 
resources in app-defaults files, allowing them to be localized. The type 
for each list is XmStringList. 


Table 2-4 Resources Used for Reading Lists 


Widget Class Resource Name 
XmList *items: 

XmList *selecteditems: 
XmSelectionBox *listItems: 


Table 2-5 lists the resources used for setting titles and icon names. 
Normally, an application need only set the *title: and *iconName: 
resources. The encoding of each is automatically detected for clients doing 
proper locale management. All of these are of type char or XmSt ring. 


Table 2-5 Resources Used for Setting Titles and Icon Names 


Widget Class Resource Name 
TopLevelShell *iconName: 
TopLevelShell *iconNameEncoding:! 
WmShell *title: 

WmShell *titleEncoding: 1 
XmBulletinBoard *dialogTitle: 
XmScale *titleString: 


1. This resource should not beset by the application. lf the application calls xt Set LanguageProc, the 
default value (None) of this resource will automatically be set, ensuring that localized text can be used for 
thetitle 
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Text Widget 


Table 2-6 lists the Text [Field] resources that are locale-sensitive or 
about which the developer of an internationalized application should know. 


Table 2-6 Locale-Sensitive Text[Field] Resources 


Widget Class 
XmSelectionBox 
XmSelectionBox 
XmText 
XmText 
XmText 
XmText 
XmText 
XmTextField 
XmTextField 
XmTextField 
d 


XmTextFiel 


XmTextField 


Resource Name 
*textColumns:! 
*textString: 
*columns:+ 
*modifyVerifyCallback: 
*modifyVerifyCallbackWcs: 
*value: 

*valueWcs: 

*columns:?+ 
*modifyVerifyCallback: 
*modifyVerifyCallbackWcs: 


*value: 


*valueWcs: 


1. The*columns resource specifies the initial width of the Text [Field] widget in terms of the number of 
characters to be displayed. In the case of a variable width font or in a locale wherethe size of a character 
varies significantly, a column is the amount of space required to display the widest character in that 
locale’s character repertoire. F or example, a column width of 10 guarantees that at least 10 characters of 
thecurrent locale can be displayed; it is possible (likely) that more than that number of characters can be 


displayed in the allocated space. 
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Input Method (Keyboards) 


Table 2-7 lists localized resources for customizing the input method. These 
resources allow the user or the application to control which input method 
will be used for the specified locale and which preedit style (if applicable 
and available) will be used. 


Table 2-7 Localized Resources for Input Method Customization 


Widget Class Resource Name 
VendorShell *inputMethod: 
VendorShell *preeditType: 


Pixmap (Icon) Resources 


Table 2-8 lists pixmap resources. In some cases, a different pixmap may be 
needed for a given locale. 


Table 2-8 Pixmap Resources 


Widget Class Resource Name 

Core *backgroundPixmap: 
WMShell *iconPixmap: 

XmDragIcon *pixmap: 

XmDropSite *animation[Mask|Pixmap]: 
XmLabel [Gadget ] *labelInsensitivePixmap: 
XmLabel [Gadget ] *labelPixmap: 
XmMessageBox *symbolPixmap: 
XmPushButton [Gadget ] *armPixmap: 
XmToggleButton [Gadget ] *selectInsensitivePixmap: 
XmToggleButton [Gadget ] *selectPixmap: 


A pixmap is a screen image that is stored in memory so that it can be 
recalled and displayed when needed. The desktop has a number of pixmap 
resources that allow the application to supply pixmaps for backgrounds, 
borders, shadows, label and button faces, drag icons, and other uses. As 
with text, some pixmaps may be specific to particular language 
environments; these pixmaps must be localized. 
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The desktop maintains caches of pixmaps and images. The 
XmGetPixmapByDepth() function searches these caches for a requested 
pixmap. If the requested pixmap is not in the pixmap cache and a 
corresponding image is not in the image cache, the 
XmGetPixmapByDepth() function searches for an X bitmap file whose 
name matches the requested image name. The XmGet PixmapByDepth () 
function calls the XtResolvePathname () function to search for the file. If 
the requested image name is an absolute path name, that path name is the 
search path for the xtResolvePathname() function. Otherwise, the 
XmGetPixmapByDepth() function constructs a search path in the following 
way: 


e lf the XBMLANGPATH environment variable is set, the value of that 
variable is the search path. 


¢ |f XBMLANGPATH iS not set but XAPPLRESDIR is set, the 
XmGetPixmapByDepth () function uses a default search path with 
entries that include $XAPPLRESDIR, the user’s home directory, and 
vendor-dependent system directories. 


¢ If neither XBMLANGPATH nor XAPPLRESDIR is set, the 
XmGetPixmapByDepth () function uses a default search path with 
entries that include the user’s home directory and vendor-dependent 
system directories. 


These paths may include the %B substitution field. In each call to the 
XtResolvePathname() function, the XmGetPixmapByDepth () function 
substitutes the requested image name for %B. The paths may also include 
other substitution fields accepted by the xtResolvePathname() function. 
In particular, the xtResolvePathname () function substitutes the display’s 
language string for %L, and it substitutes the components of the display’s 
language string (in a vendor-dependent way) for %l, %t, and %c. The 
substitution field %T is always mapped to bitmaps, and %S is always 
mapped to Null. 


Because there is no string-to-pixmap converter supplied by default, 
pixmaps are generally set by the application at creation time by first 
retrieving the pixmap with a call toXmGet Pixmap (). XmGetPixmap () USS 
the current locale to determine where to locate the pixmap. (See the 
XmGetPixmap() man page for a description of how locale is used to locate 
the pixmap.) 
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Font Resouit es 


Table 2-9 lists the localized font resources. All XmFont List resources are of 
type XmFontList. In almost all cases, a fontset should be used when 
specifying a fontlist element. The only exception is when displaying 
character data that does not appear in the character set of the user (for 
example, displaying math symbols or dingbats). 


Table 2-9 Localized Font Resources 


Widget Class Resource Name 
VendorShell *buttonFontList: 
VendorShell *defaultFontList: 
VendorShell *labelFontList: 
VendorShell *textFontList: 
XmBulletinBoard *buttonFontList: 
XmBulletinBoard *defaultFontList: 
XmBulletinBoard *labelFontList: 
XmBulletinBoard *textFontList: 
XmLabel [Gadget ] *fontList: 

XmList *fontList: 
XmMenuShell *buttonFontList: 
XmMenuShell *defaultFontList: 
XmMenuShell *labelFontList: 
XmText *fontList: 
XmTextField *fontList: 


Operating System Intematonalized Functions 
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Table 2-10 lists the base operating system internationalized functions in a 
common open software environment. 
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Applications should perform proper locale management with the 
assumption that a locale may have from 1 to 4 bytes per coded character. 


Table 2-10Base Operating System I nternationalized Functions 


Locale Management Single-byte Multibyte Wide Character 
Convert mb <> wc mbtowc wetomb 
mbstowcs wcestombs 
Classification isalpha isalpha 
is* isw* 
wctype 
Case Mapping tolower towlower 
toupper towupper 
Format Miscellaneous localeconv 


nl_langinfo 


Format of Numeric strtol westol 
strtod wcestod 
westoi 
Format Time/M onetary strftime wesftime 
strptime 
strfmon 
String Copy streat wescat 
Sstrcpy wesncat 
strncat WwCSCpy 
strncpy wcesncpy 
String Collate strcoll wescoll 
wesxfrm 
String Misc strlen mblen wcescmp 
wcesncmp 
String Search strchr weschr 
strcspn wcescspn 
strpbrk wespbrk 
strrchr wesrchr 
strspn wcsspn 
strtok westok 
WCSWCS 
wcescspn 
1/O Display Width wewidth? 
wceswidth 
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Table 2-10Base Operating System I nternationalized Functions (Continued) 


Locale Management Single-byte Multibyte Wide Character 


1/O Printf printf printf 
vprintf vprintf 
sprintf sprintf 
vsprint vsprint 
fprintft frpintft 
vfprint vfprint 

1/O Scan scanf scanf 
sscanf sscanf 
fscanf fscanf 

1/O Character getc fgetwe 

gets fgetws 

putc fputwe 

puts fputws 
ungetwc 

Message gettxt 
catopen 
catgets 
catclose 

Convert Codeset iconv_open 
iconv 


iconv_close 


1. These functions are provided for applications using terminals. Graphical user interface (GUI) 
applications should not use these functions; instead, they should use font metric functions listed on 
page 31 todetermine spacing. 
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This chapter discusses tasks related to internationalization and distributed 


networks. 
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Interchange Concepts 


This section describes the way 8-bit user names and 8-bit data can be 
communicated on a network for communications utilities, such as ftp, mail, 
or interclient communication between the desktop clients. 


There are three primary considerations for communicating data: 
® Sender's code set and the receiver's code set. 


¢ Whether the communications protocol allows 8-bit data or is limited to 7- 
bit coded data (for example, the J apanese J UNET passes J apanese 
Industrial Standard (J |S) coded data over 7-bit protocols). 


© Type of interchange encoding available, per protocol rules. The actual 
conversion needed is dependent on the specific protocol used. 
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If the remote host uses the same code set as the local host, the following is 
true: 


e If the protocol allows 8-bit data, no conversions are needed. 


e |f the protocol allows only 7-bit data, a method is needed to map the 8- 
bit code points to 7-bit ASCII values. This could be accomplished using 
the iconv framework and one of the following types of 7-bit encoded 
methods: 
¢ Map 8-bit data as specified in the POSIX.2 specification for uuencode 
and uudecode algorithms. 

* Optionally, the 8-bit data may be mapped to a 7-bit interchange 
encoding as defined by the protocol; for example, 7-bit |S02022 in Xlib 
or base64 in Multipurpose Internet Message Extensions (MIME). 


If the remote host’s code set is different from that of the local host, the 
following two cases may apply. The conversion needed is dependent on the 
specific protocol used. 


e If the protocol allows 8-bit data, the protocol will need to specify which 
side does the iconv conversion and to specify the encoding on the wire. 
In some protocols, an 8-bit interchange encoding is recommended that is 
capable of encoding all possible code sets and identifying character 
repertoire. 


e If the protocol allows only 7-bit data, a 7-bit interchange encoding is 
needed, as is the identifying character repertoire. 


iconv Interface 


In a network environment, the code sets of the communicating systems and 
the protocols of communication determine the transformation of user- 
specified data so that it can be sent to the remote system in a meaningful 
way. The user data (not user names) may need to be transformed from the 
sender’s code set to the receiver's code set, or 8-bit data may need to be 
transformed into a 7-bit form to conform to protocols. A uniform interface is 
needed to accomplish this. 


In the following examples, using the iconv interface is illustrated by 
explaining how to use iconv_open(), iconv(), and iconv_close().To 
do the conversion, iconv_open() must be followed by iconv(). The 
terms 7-bit interchange and 8-bit interchange are used to refer to any 
interchange encoding used for 7-bit and 8-bit data, respectively. 
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Sender and Receiver Usethe Same Code Sets: 


© |f the protocol allows 8-bit data, use 8-bit data because the same code set 
is being used. No conversion is needed. 


¢ |f the protocol allows only 7-bit data, use iconv: 


¢ Sender 

cd = iconv_open(locale_codeset, uuencoded ); 
¢ Receiver 

cd = iconv_open("uucode", locale_codeset ); 


Sender and Receiver Use Different Code Sets: 
e If the protocol allows 8-bit data: 
¢ Sender 
cd = iconv_open(locale_codeset, 8-bitinterchange ) ; 
¢ Receiver 
cd = iconv_open (8-bitinterchange, locale_codeset ); 
© |f the protocol allows only 7-bit data, do the following: 
¢ Sender 
cd = iconv_open(locale_codeset, 7-bitinterchange ) ; 
¢ Receiver 
cd = iconv_open (7-bitinterchange, locale_codeset ); 
The locale_codeset refers to the code set being used locally by the 
application. Note that while the nl1_langinfo(CODESET) function may be 
used to obtain the code set associated with the current locale, it is 


implementation-dependent whether any conversion names match the 
return from the nl_langinfo(CODESET) function. 


Internationalization and Distributed Networks 57 


lll 
Wu 


The Table 3-1 outlines how iconv can be used to perform conversions for 
various conditions. Specific protocols may dictate other conversions needed. 


Table 3-1 Using iconv to Perform Conversions 


Communication with 
system using the same 
code set (for example, 


XYZ) 
Conversion 7-bit &bit 
to Use Protocol Protocol 
code XYZ Invalid Best Choice 
7-bit OK OK 
Interchange 
1SO2022 
8-bit Invalid+ OK 
Interchange 
1SO02022 
ISO 10646 
7-bit OK OK 
Untagged 
quoted- 
printable 
uucode 
8-bit Invalid OK 
Untagged 
base64 


Communication with system 
using different code sets or 
receiver’s code set is 


unknown 


7-bit 
Protocol 


Invalid 


Best Choice 


Invalid 


Requires 
code set 
identification 


Requires 
code set 
identification 


8-bit Protocol 


Invalid if 
remote code set 
is unknown 


OK 


Best Choice 


Requires 
code set 
identification 


Requires 
code set 
identification 


1. Invalid means theinterchange encoding should not be used for the choice of code set and type of protocol. 
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Stateful and Stateless Conversions 


Code sets can be classified into two categories: stateful encodings and 
stateless encodings. 


State ful Encodings 


Stateful encoding uses sequences of control codes, such as shift-in/shift-out, 
to change character sets associated with specific code values. 


For instance, under compound text, the control sequence "ESC$(B" can be 
used to indicate the start of J apanese 16-bit data in a data stream of 
characters, and "ESC(B" can be used to indicate the end of this double-byte 
character data and the start of 8-bit ASCII data. Under this stateful 
encoding, the bit value 0x43 could not be interpreted without knowing the 
shift state. The EBCDIC Asian code sets use shift-in/shift-out controls to 
swap between double and single-byte encodings, respectively. 


Converters that are written to do the conversion of stateful encodings to 
other code sets tend to be a little complex due to the extra processing 
needed. 


Stateless Enc odings 


Stateless code sets are those that can be classified as one of two types: 


* Single-byte code sets, such as the |SO8859 family 
¢ Multibyte code sets, such as PC codes for J apanese and Shift-] 1S (SJ 1S) 


The term multi byte code sets is also used to refer to any code set that needs 
one or more bytes to encode a character; multibyte code sets are considered 
stateless. 


Note - Conversions are meaningful only if the code sets represent the same 
character set. 
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Simple Text Basic Interchange 


When a program communicates data to another program residing on a 
remote host, a need may arise for conversion of data from the code set of 
the source machine to that of the receiver. For example, this happens when 
a PC system using PC codes needs to communicate with a workstation 
using an International Organization for Standardization/E xtended UNIX 
Code (ISO/EUC) encoding. Another example occurs when a program obtains 
data in one code set but has to display this data in another code set. To 
support these conversions, a standard program interface is provided based 
on the XPG4 iconv() function definitions. 


All components doing code set conversion should use the iconv functions 
as their interface to conversions. Systems are expected to provide a wide 
variety of conversions, as well as a mechanism to customize the default set 
of conversions. 


iconv Conversion Functions 


The common method of conversions from one code set to another is through 
a table-driven method. In some cases, these tables may be too large, hence 
an algorithmic method may be more desirable. To accommodate such 
diverse requirements, a framework is defined in XPG4 for code set 
conversions. In this framework, to convert from one code set to another, 
open a converter, perform the conversions, and close the converter. The 
iconv functions are iconv_open(), iconv(),and iconv_close(). 


Code set converters are brought under the framework of the 
iconv_open(), iconv(), and iconv_close() set of functions. With 
these functions, it is possible to provide and to use several different types of 
converters. Applications can call these functions to convert characters in 
one code set into characters in another code set. With the advent of the 
iconv framework, converters can be provided in a uniform manner. The 
access and use of these converters is being standardized under X/Open 
XPG4. 
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X Interc lient (IC CCM) Conversion Functions 


Xlib provides the following functions for doing conversions. 


X ICCCM Multibyte Functions ICCCM Wide Character 


Functions 
XmbTextPropertyToTextList () XwcTextPropertyToTextList () 
XmbTextListToTextProperty () XwcTextListToTextProperty () 


Note - The 1ibxXm library does provide the xmStringConvertToCT () 
and XmStringConvertFromCT() functions; however, these are not 
recommended because there are some hardcoded assumptions about certain 
XmString tags. For example, if the tag is bold, XmStringConvertToCT () 
is implementation-dependent. Across various platforms, the behavior of this 
function cannot be guaranteed in all international regions. 


Refer to “Interclient Communications Conventions for Localized Text” on 
page 123 for more information. 


Window Titles 


The standard way for setting titles is to use resources. But for applications 
that set the titles of their windows directly, a localized title must be sent to 
the Window Manager. Use the xCcompoundText Style encoding defined in 
XICCEncodingStyle, as well as the following guidelines: 


* Compound text can be created either by 
XmbTextListToTextProperty() or 
XwcTextListToTextProperty(). 


® Localized titles can be displayed using the xmNtitle and 
XmNt it leEncoding resources of the WMShel1 widget. Localized icon 
names can be displayed using the xmNiconName and 
XmNiconNameEncoding resources of the TopLevelShel1 widget. 


* Localized titles of dialog boxes can also be displayed using the 
XmNdialogTitle resource of the XmBulletinBoard widget. 


© Window Manager should have an appropriate fontlist for displaying 
localized strings. 
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Mail Basic Interchange 
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Following is an example of displaying a localized title and icon name. 
Compound text is made from the compound string in this example. 


include <nl_types.h> 
Widget toplevel; 

Arg al[10]; 

int ac; 

XTextProperty title; 

char *localized_string; 
nl_catd fd; 


XtSetLanguageProc( NULL, NULL, NULL ); 

fd = catopen( "my_prog", 0 ); 

localized_string = catgets(fd, set_num, mes_num, "defaulttitle") ; 

XmbTextListToTextProperty( XtDisplay(toplevel), &localized_string, 
1, XCompoundTextStyle, &title); 

ac = 0; 

XtSetArg(al[ac], XmNtitle, title.value); act+t; 

XtSetArg(al[ac], XmNtitleEncoding, title.encoding); act+t; 

XtSetValues(toplevel, al, ac); 


If you are using a window rather than widgets, the 
XmbSetWMProperties () function automatically converts a localized string 
into the proper XICCEncodingStyle. 


In general, electronic mail (email) strategy has been one of turning email 
into a canonical, labeled format as opposed to optimizing a message given 
knowledge of the receiver’s locale. This means that in the email world, you 
should always assume that the receiver may be in a different locale. In the 
desktop world, the default email transport is Simple Mail Transfer Protocol 
(SMTP), which only supports 7-bit transmission channels. 


With this understanding, the email strategy for the desktop is as follows: 


® The sending agents, by default (unless instructed otherwise by the user), 
converts a body part into a standard format for the sending transmission 
channel and labels the body part with the character encoding used. 


® The receiving agent looks at the body part to see if it can support the 
character encoding; if it can, it converts it into the local character set. 
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In addition, because the MIME format is used for messages, any 8-bit to 7- 
bit transformations are done using the built-in MIME transport encodings 
(base64 or quoted-printable). See the Request for Comments (RFC) 1521 
MIME standard specification. 


Enc odings and Code Sets 


To understand code sets, it is necessary to first understand character sets. 
A character set is a collection of predefined characters based on the specific 
needs of one or more languages without regard to the encoding values used 
to represent the characters. The choice of which code set to use depends on 
the user's data processing requirements. A particular character set can be 
encoded using different encoding schemes. For example, the ASCII 
character set defines the set of characters found in the English language. 
The J apanese Industrial Standard (J 1S) character set defines the set of 
characters used in the J apanese language. Both the English and J apanese 
character sets can be encoded using different code sets. 


The 1SO2022 standard defines a coded character set as a group of precise 
rules that defines a character set and the one-to-one relationship between 
each character and its bit pattern. A code set defines the bit patterns that 
the system uses to identify characters. 


A code page is similar to a code set with the limitation that a code-page 
specification is based on a 16-column by 16-row matrix. The intersection of 
each column and row defines a coded character. 


Code SetStategy 


The common open software environment code set support is based on 
International Organization for Standardization (ISO) and industry- 
standard code sets providing industry-standard code sets that satisfy the 
data processing needs of users. 


Each locale in the system defines which code set it uses and how the 
characters within the code set are manipulated. Because multiple locales 
can be installed on the system, multiple code sets can be used by different 
users on the system. While the system can be configured with locales using 
different code sets, all system utilities assume that the system is running 
under a single code set. 
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Most commands have no knowledge of the underlying code set being used 
by the locale. The knowledge of code sets is hidden by the code-set- 
independent library subroutines (I nternationalization libraries), which pass 
information to the code-set-dependent subroutines. 


Because many programs rely on ASCII, all code sets include the 7-bit 
ASCII code set as a proper subset. Because the 7-bit ASCII code set is 
common to all supported code sets, its characters are sometimes referred to 
as the portable character set. 


The 7-bit ASCII code set is based on the |SO646 definition and contains the 
control characters, punctuation characters, digits (0-9), and the English 
alphabet in uppercase and lowercase. 


Code Set Stucture 
Each code set is divided into two principle areas: 


® Graphic Left (GL) Columns 0-7 
® Graphic Right (GR) Columns 8-F 


The first two columns of each code set are reserved by |SO standards for 
control characters. The terms CO and C1 are used to denote the control 
characters for the Graphic Left and Graphic Right areas, respectively. 


Note - The PC code sets use the Cl control area to encode graphic 
characters. 


The remaining six columns are used to encode graphic characters (see 
Table 3-2 on page 65). Graphic characters are considered to be printable 
characters, while the control characters are used by devices and 
applications to indicate some special function 
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Table 3-2 Code Set Overview 
0 12 3 4 5 6 7 8 9 A B C D_E F 


MMVDUOWDWPOONDOUARWDH-O 


Co (Graphic Left) C1 (Graphic Right) 
C C 
) ) 
n n 
; t Code 
S 7-Bit ASCII : Set 
i Unique 
s s 
Control Characters 


Based on the ISO definition, a control character initiates, modifies, or stops 
a control operation. A control character is not a graphic character, but can 
have graphic representation in some instances. The control characters in 
the |SO646-IRV character set are present in all Supported code sets,and the 
encoded values of the CO control characters are consistent throughout the 
code sets. 


Graphic Characters 


Each code set can be considered to be divided into one or more character 
sets, such that each character is given a unique coded value. The ISO 
standard reserves six columns for encoding characters and does not allow 
graphic characters to be encoded in the control character columns. 


Single- Byte Code Sets 


Code sets that use all 8 bits of a byte can support European, Middle 
Eastern, and other alphabetic languages. Such code sets are called single- 
byte code sets. This provides a limit of encoding 191 characters, not 
including control characters. 
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Multibyte Code Sets 


The term multibyte code sets is used to refer to all possible code sets 
regardless of the number of bytes needed to encode any specific character. 
Because the operating system should be capable of supporting any number 
of bits to encode a character, a multibyte code set may contain characters 
that are encoded with 8, 16, 32, or more bits. Even single-byte code sets are 
considered to be multibyte code sets. 


Extended UNIX Code (EUC ) Code Set 


The EUC code set uses control characters to identify characters in some of 
the character sets. The encoding rules are based on the |SO2022 definition 
for the encoding of 7-bit and 8-bit data. The EUC code set uses control 
characters to separate some of the character sets. 


The term EUC denotes these general encoding rules. A code set based on 
EUC conforms to the EUC encoding rules but also identifies the specific 
character sets associated with the specific instances. For example, euc) P 
for J apanese refers to the encoding of the J |S characters according to the 
EUC encoding rules. 


The first set (CSO) always contains an |SO646 character set. All of the 
other sets must have the most-significant bit (MSB) set to 1, and they can 
use any number of bytes to encode the characters. In addition, all 
characters within a set must have: 


* Same number of bytes to encode all characters 
* Same column display width (number of columns on a fixed-width 
terminal) 


Each character in the third set (CS2) is always preceded with the control 
character SS2 (single-shift 2, Ox8e). Code sets that conform to EUC do not 
use the SS2 control character other than to identify the third set. 


Each character in the fourth set (CS3) is always preceded with the control 
character SS3 (single-shift 3, Ox8f). Code sets that conform to EUC do not 
use the SS3 control character other than to identify the fourth set. 
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ISO EUC Code Sets 


The following code sets are based on definitions set by the International 
Organization for Standardization (ISO). 


1SO646-I RV 
1S08859-1 
1S08859-x 
euc) P 
eucl W 
euckKR 


ISO 646- IRV 


The 1SO646-IRV code set defines the code set used for information 
processing based on a 7-bit encoding. The character set associated with this 
code set is derived from the ASCII characters. 


ISO8859-1 


1SO08859-1 encoding is a single-byte encoding that is based on and is 
compatible with other |SO, American National Standards Institute (ANSI), 
and European Computer Manufacturer's Association (ECMA) code 
extension techniques. The |SO8859 encoding defines a family of code sets 
with each member containing its own unique character sets. The 7-bit 
ASCII code set is a proper subset of each of the code sets in the |SO8859 
family. 


The |1SO8859-1 code set is called the |SO Latin-1 code set and consists of 
two character sets: 


© 1S0646-IRV Graphic Left, 7-bit ASCII character set 
* |1S08859-1 Graphic Right (Latin) character set 


These character sets combined include the characters necessary for 
Western European languages such as Danish, Dutch, English, Finnish, 
French, German, Icelandic, Italian, Norwegian, Portuguese, Spanish, and 
Swedish. 


While the ASCII code set defines an order for the English alphabet, the 
Graphic Right (GR) characters are not ordered according to any specific 
language. The language-specific ordering is defined by the locale. 
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Other ISO8859 C ode Sets 


This section lists the other significant |S08859 code sets. Each code set 
includes the ASCII character set plus its own unique characters. 


1SO08859-2 
Latin alphabet, No. 2, Eastern Europe 


Albanian 
Czechoslovakian 
English 

German 
Hungarian 
Polish 
Rumanian 
Serbo-Croatian 
Slovak 

Slovene 


1S08859-5 
Latin/Cyrillic alphabet 


Bulgarian 
Byelorussian 
English 
Macedonian 
Russian 
Ukrainian 


1SO8859-6 
Latin/Arabic alphabet 


© English 
© Arabic 


1SO8859-7 
Latin/Greek alphabet 


© English 
° Greek 


CDE: Internationalization Programmer’s Guide 


CF) 
lll 


1SO8859-8 
Latin/Hebrew alphabet 


© English 
© Hebrew 


1SO8859-9 
Latin/Turkish alphabet 


Danish 
Dutch 
English 
Finnish 
French 
German 
Irish 
Italian 
Norwegian 
Portuguese 
Spanish 
Swedish 
Turkish 


euc] P 


The EUC for J apanese consists of single-byte and multibyte characters (2 
and 3 bytes). The encoding conforms to |SO2022 and is based on J |S and 
EUC definitions, see .Table 3-3. 


Table 3-3 Encoding for euc} P 


cs Encoding Character Set 
csO OXXXXXXX ASCII 

csl 1XXXXXXX 1XXXXXXX J 1S X0208-1990 
cs2 Ox8E 1XXXXXXX J 1S X0201-1976 
cs3 Ox8F UXXXXxXxXX —-LXxxxxxx = J 1S X0212-1990 
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J 1S X0208-1990 


A code of the J apanese graphic character set for information interchange 
(1990 version) that contains 147 special characters, 10 numeric digits, 83 
Hiragana characters, 86 Katakana characters, 52 Latin characters, 48 
Greek characters, 66 Cyrillic characters, 32 line-drawing elements, and 
6355 Kanji characters. 


J 1S X0201 
A code for information interchange that contains 63 Katakana characters. 


J 1S X0212-1990 


A code of the supplementary J apanese graphic character set for information 
interchange (1990 version) that contains 21 additional special characters, 
21 additional Greek characters, 26 additional Cyrillic characters, 27 
additional Latin characters, 171 Latin characters with diacritical marks, 
and 5801 additional Kanji characters. 


euc TW 


The EUC for Traditional Chinese is an encoding consisting of characters 
that contain single-byte and multibyte (2 and 4 bytes) characters. The EUC 
encoding conforms to |SO2022 and is based on the Chinese National 
Standard (CNS) as defined by the Republic of China and the EUC 
definition, see Table 3-4. 


Table 3-4 Encoding for eucTW 


cs Encoding Character Set 

csO OXXXXXXX ASCII 

csl 1XXXXXXX 1XXXXXXX CNS 11643.1992 - plane 1 

cs2 Ox8E A2 1XXXXXXX 1XXXXXXX CNS 11643.1992 - plane 2 

cs3 Ox8E A3 1XXXXXXX 1XXXXXXX CNS 11643.1992 - plane 3 
Ox8E BO 1XXXXXXX 1XXXXXXX CNS 11643.1992 - Plane 16 
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CNS 11643-1992 defines 16 planes for the Chinese Standard Interchange 
Code, each plane can support up to 8836 characters (94x94). Currently, only 
planes 1 through 7 have characters assigned. Table 3-5 shows the 16 planes 
of the CNS 11643-1992 standard. 


Table 3-5 16 Planes of the CNS 11643-1992 Standard 


Plan # of 

e Definition Character EUC Encoding 

1 Most frequently used 6085 A1A1-FDCB 

2 Secondary frequently 7650 8EA2 Al1A1 - 8EA2 F2C4 

3 Exec.Yuen EDP ? center 6148 8EA3 Al1A1 - 8EA3 E2C6 

4 RIS?, Vendor defined 7298 8EA4 AlAI1 - BEA4 
EEDC 

5 Rarely used by MOE? 8603 8EA5 Al1A1 - 8EA5 
FCD1 

6 Variation char set lby MOE 6388 8EA6 AlA1 - 8EA6 
E4FA 

7 Variation char set 2 by MOE 6539 8EA7 AlA1 - 8EA7 
E6D5 

8 Undefined 0 8EA8 AlA1 - 8EA8 
FEFE 

9 Undefined 0 8EA9 AlA1 - 8EA9 
FEFE 

10 Undefined 0 8EAA AlA1 - 8EAA 
FEFE 

11 Undefined 0 8EAB Al1A1 - 8EAB 
FEFE 

12 User Defined Character 0 8EAC AlA1 - 8EAC 

(UDC) FEFE 

13 UDC 0 8EAD AlA1 - 9EAD 
FEFE 

14 UDC 0) 8EAE AlA1 - 8EAE 
FEFE 
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Table 3-5 16 Planes of the CNS 11643-1992 Standard 


Plan # of 

e Definition (Continued) Character EUC Encoding 

15 UDC 0 8EAF A1A1 - 8EAF 
FEFE 

16 UDC 0 8EBO AlA1 - 8EBO 
FEFE 


1. EDP: Center of Directorate, General of Budget, Accounting, and Statistics 
2. RIS: Residence! nformation System 
3. MOE: Ministry of Education 


euc KR 


The EUC for Korean is an encoding consisting of single-byte and multibyte 
characters (shown in Table 3-6). The encoding conforms to |SO2022 and is 
based on Korean Standard Code (KSC) set and EUC definitions. 


Table 3-6 Encoding for eucKR. 


cs Encoding Character Set 
csO OXXXXXXX ASCII 

csl 1XXXXXXX 1XXXXXXX KS C 5601-1992 
cs2 Not used 

cs3 Not used 


KSC 5601-1992 (code of the Korean character set for information 
interchange, 1992 version) contains 432 special characters, 30 Arabic and 
Roman numeral characters, 94 Hangul alphabet characters, 52 Roman 
characters, 48 Greek characters, 27 Latin characters, 169 |] apanese 
characters, 66 Russian characters, 68 line-drawing elements, 2344 
precomposed Hangul characters, and 4888 Hanja characters. 


One Hangul character can be comprised of several consonants and vowels. 
Most Hangul words can be expressed in Hanja words. Hanja is a set of 
Traditional Chinese characters, which is currently used by Korean people. 
Each Hanja character has its own meaning and is thus more specific than 
Hangul most of the time. 
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This chapter discusses tasks related to internationalizing with Motif. 


Locale Management 73 
Font Management 75 
Font List Syntax 78 
Drawing Localized Text 80 
Inputting Localized Text 87 
Internationalized User Interface Language 92 


The term language environment refers to the set of localized data that the 
application needs to run correctly in the user-specified locale. A language 
environment supplies the rules associated with a specific language. In 
addition, the language environment consists of any externally stored data, 
such as localized strings or text used by the application. For example, the 
menu items displayed by an application might be stored in separate files for 
each language supported by the application. This type of data can be stored 
in resource files, User Interface Definition (UID) files, or message catalogs 
(on XPG3-compliant systems). 
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A single language environment is established when an application runs. 
The language environment in which an application operates is specified by 
the application user, often either by setting an environment variable (LANG 
or LC_* on POSIX-based systems) or by setting the xnlLanguage resource. 
The application then sets the language environment based on the user's 
specification. The application can do this by using the set locale () 
function in a language procedure established by the 
XtSetLanguageProc() function. This causes Xt to cache a per-display 
language string that is used by the XtResolvePathname () function to find 
resource, bitmap, and User Interface Language (UIL) files. 


An application that supplies a language procedure can either provide its 
own procedure or use an Xt default procedure. In either case, the 
application establishes the language procedure by calling the 
XtSetLanguageProc() function before initializing the toolkit and before 
loading the resource databases (such as by calling the 
XtAppInitialize() function). When a language procedure is installed, Xt 
calls it in the process of constructing the initial resource database. Xt uses 
the value returned by the language procedure as its per-display language 
string. 


The default language procedure performs the following tasks: 

© Sets the locale. This is done by using: 
setlocale(LC_ALL, language) ; 
where language is the value of the xnl Language resource, or the empty 
string (“") if the xnlLanguage resource is not set. When the 


xnlLanguage resource is not set, the locale is generally derived from an 
environment variable (LANG on POSI X-based systems). 


® Calls the XSupportsLocale() function to verify that the locale just set 
is Supported. If not, a warning message is issued and the locale is set to 
C. 


® Calls the xSetLocaleModifiers() function specifying the empty 
string. 


© Returns the value of the current locale. On ANSI C-based systems, this 
is the result of calling: 


setlocale(LC_ALL, NULL); 


The application can use the default language procedure by making the call 
to the xt Set LanguageProc() function in the following manner: 
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FontManagement 


XtSetLanguageProc (NULL, NULL, NULL); 


toplevel = XtAppInitialize(...); 


By default, Xt does not install any language procedure. If the application 
does not call the xt Set LanguageProc () function, Xt uses as its 
per-display language string the value of the xnlLanguage resource if it is 
set. If the xnlLanguage resource is not set, Xt derives the language string 
from the LANG environment variable. 


Note - The per-display language string that results from this process is 
implementation-dependent, and Xt provides no public means of examining 
the language string once it is established. 


By supplying its own language procedure, an application can use any 
procedure it wants for setting the language string. 


The desktop uses font lists to display text. A font defines a set of glyphs 
that represent the characters in a given character set. A font set is a group 
of fonts that are needed to display text for a given locale or language. A font 
list is a list of fonts, font sets, or a combination of the two, that may be 
used. Motif has convenience functions to create a font list. 


Font List Stuc ture 


The desktop requires a font list for text display. A font list is a list of font 
structures, font sets, or both, each of which has a tag to identify it. A font 
set ensures that all characters in the current language can be displayed. 
With font structures, the responsibility for ensuring that all characters can 
be displayed rests with the programmer (including converting from the code 
set of the locale to glyph indexes). 


Each entry in a font list is in the form of a tag, denent} pair, where 
eement can be either a single font or a font set. The application can create 
a font list entry from either a single font or a font set. For example, the 
following code segment creates a font list entry for a font set: 
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char fontl[] = 
"—adobe-courier-medium-r-normal-—-10-100-75-75-M-60"; 

font_list_entry = XmFontListEntryLoad (displaylD, font1, 
XmFONT_IS_FONTSET, “font_tag”); 


The XmFontListEntryLoad() function loads a font or creates and loads a 
font set. The following are the four arguments to the function: 


displaylD Display on which the font list is to be used. 


fontname A string that represents either a font name or a base font 
name list, depending on the nametype argument. 


nametype A value that specifies whether the fontname argument 
refers to a font name or a base font name list. 


tag A string that represents the tag for this font list entry. 


If the nametype argument is XmFONT_IS_FONTSET, the 
XmFontListEntryLoad() function creates a font set in the current locale 
from the value in the fontname argument. The character sets of the fonts 
specified in the font set are dependent on the locale. If nametype is 
XmFONT_IS_FONT, the XmFontListEntryLoad() function opens the font 
found in fontname. In either case, the font or font set is placed into a font 
list entry. 


The following code example creates a new font list and appends the entry 
font_list_entry toit: 

XmFontList font_list; 

XmFontListEntry font_list_entry; 


font_list = XmFontListAppendEntry (NULL, font_list_entry) ; 
XmFontListEntryFree (font_list_entry) ; 


Once a font list has been created, the XmFontListAppendEntry () 
function adds a new entry to it. The following example uses the 
XmFontListEntryCreate() function to create a new font list entry for an 
existing font list. 

XFontSet font2; 

char *font_tag; 

XmFontListEntry font_list_entry2; 


font_list_entry2 = XmFontListEntryCreate (font_tag, 
XmFONT_IS_FONTSET, (XtPointer) font2); 
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The font2 parameter specifies an XFontSet returned by the 
XCreateFontSet () function. The arguments to the 
XmFontListEntryCreate() function are font_tag, 
XmFONT_IS_FONTSET, and font2, which are the tag, type, and font, 
respectively. The tag and the font set are the {tag, denent} pair of the font 
list entry. 


To add this entry to the font list, use the XmFont ListAppendEntry () 
function again, only this time, its first parameter specifies the existing font 
list. 


font_list = XmFontListAppendEntry(font_list, font_list_entry2); 
XmFontListEntryFree (font_list_entry2) ; 


Font Lists Examples 


The syntax for specifying a font list in a resource file depends on whether 
the list contains fonts, font sets, or both. 


Obtaining a Font 
To obtain a font, specify a font and an optional font list element tag. 


¢ |f the tag is present, it should be preceded by an = (equal sign). 
¢ |f the tag is not present, do not use an = (equal sign). 


Entries specifying more than one font are separated by a , (comma). 


Obtaining a Font Set 


To obtain a font set, specify a base font list and an optional font list 
element tag. 


¢ |f the tag is present, it should be preceded by a: (colon) instead of an = 
(equal sign). 

e If the tag is not present, the colon must still be present as this is what 
distinguishes a font from a font set in the resource declaration. 


Fonts specified in the base font list are separated by a ; (semicolon). Entries 
specifying more than one font set are separated by a , (comma). 
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Specifying a Font When the Font List ElementTag Is Absent 


If the font list element tag is not present, the default 
XmF ONTLIST_DEFAULT_TAG is used. Here are some examples. 


® Specifying a font using the default font list element tag: 


*fontList: fixed 
*fontList: \ 
-adobe-courier-medium-r-normal--10-100-75-75-M-60-iso8859-1 


® Specifying a font list element tag: 
*fontList: fixed=ROMAN, 8x13bold=BOLD 


® Specifying two fonts, one with the default font list element tag and one 
with an explicit tag: 


*fontList: fixed, 8x13bold=BOLD 


Specifying a Font Set When the Font List HementTag Is Absent 


If the font list element tag is not present, the default 
XmF ONTLIST_DEFAULT_TAG is used. Here are some examples of specifying 
a font set. 


® Let Xlib select the fonts without specifying a font list element tag: 


*fontList: -dt-application-medium-r-normal-*-m*-—*-—*-—*-m-* 


© Let Xlib select the fonts and specify a font list element tag as MY_TAG: 


*fontList: -dt-application-medium-r-normal-*-m*-*-*-*-m*:MY_TAG 


© Let Xlib select the fonts, specify a font list element tag for bold fonts, 
and use the default font list element tag for the others: 


*fontList:-dt-application-medium-r-normal-—*-—m*-—*—-—*-*-m-*:,\ 
-dt-application-medium-r-normal-style2-m*-*-*-*-m-*:BOLD 


Font List Syntax 


The XmFont List () data type can contain one or more entries that are 
associated with one of the following elements: 


XFont Struct An X font that can be used to draw text encoded in 
the charset of the font, that is, font-encoded text. 


XFontSet A collection of XFont Struct fonts used to draw text 
encoded in a locale, that is, localized text. 
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The following syntax is used by the string-to-xmFontList converter: 


XmFontList := <fontentry> {’, ’fontentry} 


fontentry := <fontname><fontid> 
| <baselist><fontsetid> 


baselist := <fontname>{’;’<fontname>} 
fontsetid := ':’<string> | <defaultfontset> 
fontname := <XLFD string> 

fontid := '='<string> | <defaultfont> 


XLFD string:= refer to XLFD Specification 


defaultfont := NULL 
defaultfontset:= ’:’NULL 
string := any character from ISO646IRV, except newlin 


A fontentry with a given XmFontList can specify either a font or a font 
set. In either case, the ID (fontid or fontsetid) can be referenced by a 
segment within a compound string (xmString). 


Both defaultfont and defaultfontset can define the default 
fontentry, yet there can only be one default per XmFontList. 


The XmFONTLIST_DEFAULT_TAG identifier always references the default 
fontentry when XmString is drawn. If the default fontentry is not 
specified, the first fontentry is used to draw. 


The resource converter operates under a single locale so that all font sets 
created are associated with the same locale. 


Note - Some implementations reserve the code set name of a locale as a 
special charset ID (fontsetid and fontid) within an xmFontList string. 
For this reason, application developers are cautioned not to use code set 
names if they want their applications to be portable across platforms. 
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Drawing Localized Text 


A compound string is a means of encoding text so that it can be displayed 
in many different fonts without changing anything in the program. The 
desktop uses compound strings to display all text except that in the Text 
and TextField widgets. This section explains the structure of a compound 
string, the interaction between it and a font list (which determines how the 
compound string is displayed), and focuses on those aspects that are 
important to the internationalization process. 


Compound Sting Components 


A compound string is an internal encoding, consisting of tag-length-value 
segments. Semantically, a compound string has components that contain 
the text to be displayed, a tag (called a font list element tag) that is 
matched with an element of a font list, and an indicator denoting the 
direction in which it is to be displayed. 


A compound string component can be one of the following four types: 
© A font list element tag. 
¢ The font list element tag XxmFONTLIST_DEFAULT_TAG indicates that 
the text is encoded in the code set of the current locale. 
¢ Other font list element tags are used later to match text with 
particular entries in a font list. 
© A direction identifier. 
© The text of the string. For internationalized applications, the text falls 


into two broad categories: either the text requires localized treatment or 
it does not. 


¢ A separator. 
The following describes each of the compound string components: 


Font list element tag Indicates a string value that correlates the text 
component of a compound string to a font or a font 
set in a font list. 


Direction Indicates the relationship between the order in 
which characters are entered on the keyboard and 
the order in which the characters are displayed on 
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the screen. For example, the display order is left-to- 
right in English, French, German, and Italian, and 
right-to-left in Hebrew and Arabic. 


Text Indicates the text to be displayed. 


Separator Indicates a special form of a compound string 
component that has no value. It is used to separate 
other segments. 


The desktop uses the specified font list element tag identified in the text 
component to display the compound string. A specified font list element tag 
is used until a new font list element tag is encountered. The desktop 
provides a special font list element tag, XmFONTLIST_DEFAULT_TAG, that 
matches a font that is correct for the current code set. It identifies the 
default entry in a font list. See “Compound Strings and Font Lists” on 
page 82 for more information. 


The direction segment of a compound string specifies the direction in which 
the text is displayed. Direction can be left-to-right or right-to-left. 


Compound Stringsand Resources 


Compound strings are used to display all text except that in the Text and 
TextField widgets. The compound string is set into the appropriate 
widget resource so that it can be displayed. For example, the label for the 
PushButton widget is inherited from the Label widget, and the resource 
is XmNlabelString, which is of type XmString. This means that the 
resource expects a value that is a compound string. A compound string can 
be created with a program or defined in a resource file. 


Setting a Compound Sting Programmatic ally 


An application can set this resource programmatically by creating the 
compound string using the XmStringCreateLocalized() compound 
string convenience function. 


This function creates a compound string in the encoding of the current 
locale and automatically sets the font list entry tag to 
XmFONTLIST_DEFAULT_TAG. 
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The following code segment shows one way to set the xmNlabelString 
resource for a push button using a program. 


#include <nl_types.h> 

Widget button; 

Args args[10]; 

int ny 

XmString button_label; 

nl_msg my_catd; 

(void) XtSetLanguageProc (NULL, NULL, NULL) ; 


button_label = XmStringCreateLocalized (catgets(my_catd, 1, 1, 
"default label"), 
XmFONTLIST_DEFAULT_TAG) ; 


/* Create an argument list for the button */ 
n = 0; 
XtSetArg (args[n], XmNlabelString, button_label); nt++; 


/* Create and manage the button */ 

button = XmCreatePushButton (toplevel, "button”, args, n); 
XtManageChild (button); 

XmStringFree (button_label); 


Setting a Compound Sting in a Defaults File 


In an internationalized program, the label string for the button label 
should be obtained from an external source. For example, the button label 
can come from a resource file instead of the program. For this example, 
assume that the push button is a child of a Form widget called forml. 


*forml.button.labelString: Push Here 
Here, the desktop’s string-to-compound-string converter produces a 


compound string from the resource file text. This converter always uses 
XmF ONTLIST_DEFAULT_TAG. 


Compound Stings and Font Lists 


When the desktop displays a compound string, it associates each segment 
with a font or font set by means of the font list element tag for that 
segment. The application must have loaded the desired font or font set, 
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created a font list that contains that font or font set and its associated font 
list element tag, and created the compound string segment with the same 
tag. 


The desktop follows a set search procedure when it binds a compound 
string to a font list entry in this way: 


1. The desktop searches the font list for an exact match with the font list 
element tag specified in the compound string. If it finds a match, the 
compound string is bound to that font list entry. 


2. If this does not provide a binding between the compound string and the 
font list, the desktop binds the compound string to the first element in 
the font list, regardless of its font list element tag. 


For backward compatibility, if an exact match is not found, a value of 
XmFONTLIST_DEFAULT_TAG in either a compound string or a font list 
matches the tag that results from creating a compound string or font list 
entry with a tag of xmSTRING_DEFAULT_CHARSET. 


Figure 4-1 on page 84 shows the relationships between a compound string, 
a font set, and a font list when the font list element tag is set to something 
other than XmFONTLIST_DEFAULT_TAG. 
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Compound String Components 


Font List 


Element Tag 


tagb “Push Here” 


Font List 


Font_Set_A 
Font_Set_B 


Font_Set_C 
Font_Set_D 


Figure4-1 Relationships between compound strings, font sets, and font lists when 
the font list element tag is not XNFONTLIST_DEFAULT_TAG 


The following example shows how to use a tag called tagb. 


XFontSet *fontl; 
XmFontListEntryfont_list_entry; 
XmFontList font_list; 

XmString label_text; 


char** missing; 

int missing_cnt; 

char* del_string; 

char *tagb; /* Font list element tag */ 

char *fontx; /* Initialize to XLFD or font alias */ 


char *button_label;/* Contains button label text */ 


fontl = XCreateFontSet (XtDisplay(toplevel), fontx, & missing, 
& missing_cnt, & def_string); 
font_list_entry = XmFontListEntryCreate (tagb, XmFONT_IS_FONTSET, 
(XtPointer) fontl); 
font_list = XmFontListAppendEntry (NULL, font_list_entry); 
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XmFontListEntryFree (font_list_entry) ; 


label_text = XmStringCreate (button_label, tagb); 


The XCreateFontSet () function loads the font set and the 
XmFontListEntryCreate() function creates a font list entry. The 
application must create an entry and append it to an existing font list or 
create a new font list. In either case, use the XmFontListAppendEntry () 
function. Because there is no font list in place, the preceding code example 
has a NULL value for the font list argument. The 
XmFontListAppendEntry () function creates a new font list called 
font_list witha singleentry, font_list_entry. To add another entry to 
font_list, follow the same procedure but supply a nonnull font list 
argument. 
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Figure 4-2 shows the relationships between a compound string, a font set, 
and a font list when the font list element tag is set to 
XmFONTLIST_DEFAULT_TAG. In this case, the value field is locale text. 


Compound String Components 


Font List Element Tag 


XmFONTLIST_DEFAULT_TAG “Push Here” 


Font List 


Font_Set_A taga 
Font_Set_B tagb 
Font_Set_C XmFONTLIST_DEFAULT_TAG€ 
Font_Set_D tagc 


Font_Set_C 


font1C 
font2C 


font3C 


Figure4-2 Relationships between compound strings, font sets, and font lists when 
a font list element tag is set to XMFONTLIST_DEFAULT_TAG 


Here, the default tag points to Font_Set_C, which in turn identifies the 
fonts needed to display the characters in the language. 
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Textand Texthield Widgets and Font Lists 


The Text and TextField widgets display text information. To do so, they 
must be able to select the correct font in which to display the information. 
The Text and TextField widgets follow a set search pattern to find the 
correct font as follows: 


1. The widget searches the font list for an entry that is a font set and has a 
font list element tag of XmFONTLIST_DEFAULT_TAG. If a match is found, 
it uses that font list entry. No further searching occurs. 


2. The widget searches the font list for an entry that specifies a font set. It 
uses the first one found. 


3. If no font set is found, the widget uses the first font in the font list. 


Using a font set ensures that there are glyphs for every character in the 
locale. 


Inputting Loc alized Text 


In the system environment, the vendorShel1 widget class is enhanced to 
provide the interface to the input method. While the VendorShell class 
controls only one child widget in its geometry management, an extension 
has been added to the VendorShell class to enhance it for managing all 
components necessary in the interface to an input method. These 
components include the status area, preedit area, and the MainWindow 
area. 


When the input method requires a status area or a preedit area or both, the 
VendorShell widget automatically instantiates the status and preedit 
areas and manages their geometry layout. Any status area or preedit area 
is managed by the VendorShel1 widget internally and is not accessible by 
the client. The widget instantiated as the child of the VendorShel1 widget 
is called the MainWindow area. 


The input method to be used by the vendorShe1l1 widget is determined by 
the xmNinputMethod resource; for example, @im=alt. The default value of 
Null indicates to choose the default input method associated with the locale 
at the time that VendorShell is created. As such, the user can affect which 
input method is selected by either setting the locale, setting the 
XmNinputMethod resource, or setting both. The locale name is 


Motif Dependencies 87 


concatenated with the XmNinputMethod resource to determine the input 
method name. The locale name must not be specified in this resource. The 
modifier name for the xmNinputMethod resource needs to be in the form 
@im=modifier, where modifier is the string used to qualify which input 
method is selected. 


The VendorShell widget can support multiple widgets that can share the 
input method. Yet only one widget can have the keyboard focus (for 
example, receive key press events and send them to an input method) at 
any given time. To support multiple widgets (such as Text widgets), the 
widgets need to be descendants of the vendorShel1 widget. 


Note - The VendorShel1 widget class is a superclass of the 
TransientShell and TopLevelShel1 widget classes. As such, an 
instantiation of a TopLevelShell or a DialogShel1 is essentially an 
instantiation of a VendorShell1 widget class. 


The VendorShell widget behaves as an input manager only if one of its 
descendants is an XmText [Field] instance. As soon as an 

XmText [Field] instance is created as a descendant of the VendorShell 
widget, VendorShell1 creates the necessary areas required by the 
particular input methods dictated by the current locale. Even if an 
XmText [Field] instance is not mapped but just created, VendorShell has 
the geometry management behavior as described previously. 


A VendorShel1 widget does the following: 


© Enables applications to process multibyte character input and output 
that is supported by the locales installed in the system. 


© Manages an input method instance as defined in the XmIm reference 
functions. 


* Supports preediting within a preedit area in either OfffheSpot, 
OverTheSpot, Root, or None mode. Localized text can be entered into any 
Text child widget in a multiple Text children widget tree by changing 
the focus. 


* Provides geometry management for descendant child widgets. 
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Geometry Management 


The vendorShell widget provides geometry management and focus 
management for the input method's user interface components, as 
necessary. If the locale warrants it (for example, if the locale is a J apanese 
Extended UNIX Code (EUC) locale), the vVendorShell1 widget 
automatically allocates and manages the geometry of any required preedit 
area or status area or both. 


Depending on the current preediting being done, an auxiliary area may be 
required. If so, the VendorShel1 widget also instantiates and manages the 
auxiliary area. Typically, the child of the vendorShell1 widget isa 
container widget (such as the XmBulletinBoard Or XmRowColumn widgets) 
that can manage multiple Text and TextField widgets, which allow 
multibyte character input from the user. In this scenario, all Text widgets 
share the same input method. 


Note - The status, preedit, and auxiliary areas are not accessible to the 
application programmer. For example, it is not intended for the application 
programmer to access the window ID of the status area. The user does not 
need to worry about the instantiation or management of these components 
as they are managed as required by the VendorShel1 widget class. 


The application programmer has some control over the behavior of the 
input method user interface components through XmNpreeditType 
resources of the VendorShe11 widget class. See “Input Methods” on 
page 13 for a description of OfffheSpot and OverTheSpot modes. 


Geometry management extends to all input method user interface 
components. When the application program window (a TopLevelShell 
widget) is resized, the input method user interface components are resized 
accordingly, and the preedited strings in them are rearranged as required. 
Of course, this assumes that the shell window has a resize policy of True. 


When the vendorShel1 widget is created, if a specific input method 
requires a status area, preedit area, or both, the size of the VendorShell 
considers the areas required by these components. The extra areas required 
by the preedit and status areas are part of the VendorShe11 widget’s area. 
They are also managed by the VendorShel1 widget, if resizing is 
necessary. 
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Because of the potential instantiation of these areas (status and preedit), 
depending on the input method currently being used, the size of the 
VendorShell widget area does not necessarily grow or shrink to 
accommodate exactly the size of its child. The size of the VendorShell 
widget area grows or shrinks to accommodate both its child’s geometry and 
the geometry of these input method user interface areas. There may be a 
difference (for example, of 20 pixels) in height between the VendorShell 
widget and its child widget (the MainWindow area). The width geometry is 
not affected by the input method user interface components. 


In summary, the requested size of the child is honored if possible; the 
actual size of the VendorShell may be larger than its child. 


The requests to specify the geometry of the vendorShel1 widget and its 
child are honored as long as they do not conflict with each other or are 
within the constraint of the VendorShel1 widget’s ability to resize. When 
they do conflict, the child’s widget geometry request has higher precedence. 
For example, if the size of the child widget is specified as 100x100, the size 
of VendorShell is also specified as 100x100. The resulting VendorShell has a 
size of 100x120, while its child widget gets a size of 100x100. If the size of 
the child widget is not specified, the VendorShell shrinks its child widget if 
necessary to honor its own size specification. For example, if the size of 
VendorShell is specified as 100x100 and no size is specified for its child, the 
child widget has a size of 100x80. If the vendorShe1l1 widget is disabled 
from resizing, regardless of what the geometry request of its child is, the 
VendorShel1 widget honors only its own geometry specification. 


Focus Management 


Languages with large numbers of characters (such as J apanese and 
Chinese) require an input method that allows the user to compose 
characters in that language interactively. This is because, for these 
languages, there are many more characters than can be reasonably mapped 
to a terminal keyboard. 


The interactive process of composing characters in such languages is called 
preediting. The preediting itself is handled by the input method. However, 
the user interface of the preediting is determined by the system 
environment. An interface needs to exist between the input method and the 
system environment. This is done through the vendorShell widget of the 
system environment. 
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Figure 4-3 illustrates a case with J apanese preediting. The string shown in 
reverse video is the string in preediting. This string can be moved across 
different windows by giving focus to the particular window. However, only 
one preediting session can occur at one time. 


flea pt FAs 


Fevhy Pete 


jaf Lampe eat char. 2 


|e 
jf inp el deme? Ta [res 
i 


a ed ae reer |S, tod 


| 

jut) See le 
Jud 

JU | seme Leip ; Cl gk ~Fa 
[ut |_eomete. wis Pisciiens 


jo Caps ese! | 


Tage) elc 


|i Reee deme Ta _ IP 
aT nee ee a ee [beeen pa _ JP 


Figure4-3 J apanese preediting example 


For an example of focus management, suppose a TopLevelShel1 widget (a 
subclass of the vendorShell widget) has an xmBulletinBoard widget 
child (MainWindow area), which has five xmText widgets as children. 
Assume the locale requires the preedit area, and assume the OverT heSpot 
mode is specified. Because the VendorShell1 widget manages only one 
instance of an input method, you can run only one preedit area at a time 
inside the TopLevelShel1 widget. If the focus is moved from one Text 
widget to another, the current preedit string under construction is also 
moved on top of the Text widget that currently has focus. Processing of 
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keys to the old Text widget is suspended temporarily. Subsequent interface 
of the input method, such as the delivery of the string at preedit 
completion, is made to the new, focused Text widget. 


The string being preedited can be moved to the location of the focus; for 
example, by clicking the mouse. 


A string that the end user is finished preediting and that is already 
confirmed cannot be reconverted. Once the string is composed, it is 
committed. Committing a string means that it is moved from the preedit 
area to the focus point of the client. 


Intemationalized User Interface Language 


92 


The capability to parse a multibyte character string as a string literal has 
been added to the User Interface Language (UIL). Creation of a UIL file is 
performed by using the characteristics of the target language and writing 

the User Interface Definition (UID) file. 


Programming forIntemationalized User Interface Language 


The UIL compiler parses nonstandard charsets as locale text. This requires 
the UIL compiler to be run in the same locale as any locale text. 


If the locale text of a widget requires a font set (more than one font), the 
font set must be specified within the resource file. The font parameter does 
not support font sets. 


To use a specific language with UIL, a UIL file is written according to 
characteristics of the target language and compiled into a UID file. The 
UIL file that contains localized text needs to be compiled in the locale in 
which it is to run. 


Sting Literals 


The following shows examples of literal strings. The cur_charset value is 
always set to the default_charset value, which allows the string literal to 
contain locale text. 
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To set locale text in the string literal with the default_charset value, enter 
the following: 


XmNlabelString = ’XXXXXX’; 
OR 

XmNlabelString = #default_charset“XXXXXX"; 

Compile the UIL file with the LANG environment variable matching the 


encoding of the locale text. Otherwise, the string literal is not compiled 
properly. 


Font Sets 


The font set cannot be set through UIL source programming. Whenever the 
font set is required, you must set it in the resource file as the following 
example shows: 


*fontList: -—*-r-*-20-*: 


Font Lists 


UIL has three functions that are used to create font lists: FONT, FONTSET, 
and FONT_TABLE. The FONT and FONTSET functions create font list entries. 
The FONT_TABLE function creates a font list from these font list entries. 


The Font function creates a font list entry containing a font specification. 
The argument is a string representing an XLFD font name. The FONTSET 
function creates a font list entry containing a font set specification. The 
argument is a comma-separated list of XLFD font names representing a 
base name font list. 


Both FONT and FONTSET have optional CHARACTER_SET declaration 
parameters that specify the font list element tag for the font list entry. In 
both cases, if no CHARACTER_SET declaration parameter is specified, UIL 
determines the font list element tag as follows: 


© If the module contains no CHARACTER_SET declaration and if the uil 
command was called with the -s option or the vil() function was 
started with use _setlocale flag set, the font list element tag is 
XmFONTLIST_DEFAULT_TAG. 
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® Otherwise, the font list element tag is the code set component of the 
LANG environment variable, if it is set in the UIL compilation 
environment; or it is the value of XmFALLBACK_CHARSET if the LANG 
environment variable is not set or has no code set. 


The FONT_TABLE function creates a font list from a comma-separated list of 
font list entries created by FONT or FONTSET. The resulting font list can be 
used as the value of a font list resource. If a single font list entry is 
supplied as the value for such a resource, UIL converts the entry to a font 
list. 


Creating Resource Files 


If necessary, set the input method-related resources in the resource file as 
shown in the following example: 


*preeditType: OverTheSpot, OffTheSpot, Root, Or None 


Setting the Environment 


For a locale-sensitive application, set the UID file to the appropriate 
directory. Set the UIDPATH or XAPPLRESDIR environment variable to the 
appropriate value. 


For example, to run the uil_sample program with an English 
environment (LANG environment variable is en_US), Set uil_sample.uid 
with Latin characters at the SHOME/en_US directory, or set 
uil_sample.uid toa directory and set the UIDPATH environment variable 
to the full path name of the uil_sample.uid file. 


To run the uil_sample program with a J apanese environment (LANG 
environment variable is ja_JP), create a uil_sample.uid file with 

J apanese (multibyte) characters at the SHOME/ ja_JP directory, or place 
uil_sample.uid toa unique directory and set the UIDPATH environment 
variable to the full path name of the uil_sample.uid file. The following 
list specifies the possible variables: 


%U Specifies the UID file string. 
%N Specifies the class name of the application. 
%L Specifies the value of the xnlLanguage resource or 


LC_CTYPE category. 
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%l 


Specifies the language component of the 
xnlLanguage resource or the LC_CTYPE category. 


If the XAPPLRESDIR environment variable is set, the 
MrmOpenHierarchy () function searches the UID file in the following 
order: 


10. 
11. 
12. 
13. 
14. 
15. 
16. 
17. 
18. 


1 
2 
3 
4 
5. 
6 
7 
8 
9 


. WU 


SHOMI 


UID file path name 


SUIDPATH 


SXAPPLRESDIR/%L/uid/%N /%U 


SXAPPLRESDIR/%l/uid/%N /%U 


SXAPPLRESDIR/uid/%N/%U 


SXAPPLRESDIR/%L/uid/%U 


SXAPPLRESDIR/%l/uid/%U 


SXAPPLRESDIR/uid/%U 


SHOME/uid/%U 


E / %U 


/usr/1lib/X11/%L/uid/%N /%U 
/usr/lib/X11/%l/uid/%N/%U 
/usr/1lib/X11/uid/%N /%U 
/usr/lib/X11/%L/uid/%U 
/usr/lib/X11/%l/uid/%U 
/usr/lib/X11/uid/%U 


/usr/include/X11/uid/%U 


If the XAPPLRESDIR environment variable is not set, the 
MrmOpenHierarchy() function uses SHOME instead of the XAPPLRESDIR 
environment variable. 
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default_charset CharacterSetin UIL 


With the default_charset string literal, any characters can be set as a valid 
string literal. For example, if the LANG environment variable is e1_GR, the 
string literal with default_charset can contain any Greek character. If the 
LANG environment variable is ja_gp, the default_charset string literal can 
contain any J apanese character encoded in J apanese EUC. 


If no character set is set to a string literal, the character set of the string 
literal is set as cur_charset. And, in the system environment, the 
cur_charset value is always set as default_charset. 


Example: uil_ sample 


Figure 4-4 shows a UIL sample program on English and J] apanese 
environments. 


Figure4-4 Sample UIL program on English and J apanese environments 


In the following sample program, LLL indicates locale text, which can be 
J apanese, Korean, Traditional Chinese, Greek, French, or others. 
uil_sample.uil 


! 
sample uil file - uil_sample.uil 


! 
! 
! C source file - uil_sample.c 
! 
! 


Resource file - uil-sample.resource 


CDE: Internationalization Programmer’s Guide 


module Test 
version = ’v1.0’ 


names = case_sensitiv 
objects = { 

XmPushButton = gadget; 
} 


PKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK 


! declare callback procedure 
1K KK KK KK OK KK KK KK KK RK 


procedure 
exit_CB; 


PKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK KKK KK KKK KK KKKKKKKKKKKKKKKKKKKKKK 


! declare BulletinBoard as parent of PushButton and Text 
1K KK RK I KK I RK KK KK KK KK 


object 
bb : XmBulletinBoard { 
arguments { 
XmNwidth = 500; 
XmNheight = 200; 
hi 
controls{ 
XmPushButton pbl; 
XmText textl; 
hi 
hi 
| KK KK RK KK KK KK KK KK KK KK 


! declare PushButton 
| KK RK RK KK KK KK KK RK RK KKK KK 
object 
pb1 : XmPushButton { 
arguments { 


XmNlabelString = #Normal “LLLexit buttonLLL”; 


XmNx = 50; 
XmNy = 50; 
hi 
callbacks { 


XmNactivateCallback = procedure exit_CB; 


he 
he 


PKKKKKKKKKKKKKKKKKKKKK 


! declare Text 
| KK KK KK RK KK KK KK KK 


textl : XmText { 
arguments { 
XmNx = 50; 
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XmNy = 150; 
he 
he 
end module; 
* 


* C source file - uil_sample.c 
* 


i 

#include <Mrm/MrmApp1.h> 

#include <locale.h> 

void exit_CB(); 

static MrmHierarchy hierarchy; 
static MrmType *class; 


[KOR K KK KR KK KK KK KK KR I / 


/* specify the UID hierarchy list*/ 


[KKK KK KK KR OK KK KK KK I  / 
static char *aray_file[]= 
{“uil_sample.uid” 
hi 
static int num_file = (sizeof aray_file / sizeof 
aray_file[0]); 


[KOR KK KK KK KK KK RR A A RAR A RR I / 


7% define the mapping between UIL procedure names*/ 


/* and their addresses */ 
[KKK KK RK KK KR KK KK KK RR RA A AAR A A RR Re / 


static MRMRegisterArg reglist[]={ 
{“exit_CB”, (caddr_t) exit_CB} 


Compound Stings in UIL 


Three mechanisms exist for specifying strings in UIL files: 


® As string literals, which may be stored in UID files as either null- 
terminated strings or compound strings 


e¢ As compound strings 

© As wide character strings 

Both string literals and compound strings consist of text, a character set, 
and a writing direction. For string literals and for compound strings with 
no explicit direction, UIL infers the writing direction from the character 


set. The UIL concatenation operator (&) concatenates both string literals 
and compound strings. 
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Regardless of whether UIL stores string literals in UID files as null- 
terminated strings or as compound strings, it stores information about each 
string’s character set and writing direction along with the text. In general, 
UIL stores string literals or string expressions as compound strings in UID 
files under the following conditions: 


¢ When a string expression consists of two or more literals with different 
character sets or writing directions 


© When the literal or expression is used as a value that has a compound 
string data type (such as the value of a resource whose data type is 
compound string) 


UIL recognizes a number of keywords specifying character sets. UIL 
associates parsing rules, including parsing direction and whether 
characters have 8 or 16 bits, for each character set it recognizes. It is also 
possible to define a character set using the UIL CHARACTER_SET function. 


The syntax of a string literal is one of the following: 


¢ ‘[character_string]’ 
© [#char_set] 
e “(character_string]” 


For each syntax, the character set of the string is determined as follows: 


© For a string declared as ‘character_string’, the character set is the code 
set component of the LANG environment variable, if it is set in the UIL 
compilation environment; or it is the value of xmFALLBACK_CHARSET if 
the LANG environment variable is not set or has no code set. By default, 
the value of XmFALLBACK_CHARSET iS 1S08859-1, but vendors may 
supply different values. 


© For astring declared as #char_set “string”, the character set is char_set. 


© For astring declared as “characte_string”, the character set depends on 

whether the module has a CHARACTER_SET clause and whether the UIL 

compiler’s use_setlocale_flag is set. 

¢ If the module has a CHARACTER_SET clause, the character set is the 
one specified in that clause. 

¢ If the module has no CHARACTER_SET clause but the uil command was 
started with the -s option, or if the vil () function was started with 
use_setlocale_flag set, UIL calls the setlocale() function and 
parses the string in the current locale. The character set of the 
resulting string is XmFONTLIST_DEFAULT_TAG. 
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¢ If the module has no CHARACTER_SET clause and the uil command 
was started without the -s option, or if the vil () function was started 
without use_setlocale_flag, the character set is the code set 
component of the LANG environment variable, if it is set in the UIL 
compilation environment, or the character set is the value of 
XmFALLBACK_CHARSET if LANG is not set or has no code set. 


UIL always stores a string specified using the COMPOUND_STRING function 
as a compound string. This function takes as arguments a string expression 
and optional specifications of a character set, direction, and whether to 
append a separator to the string. If no character set or direction is 
specified, UIL derives it from the string expression, as described in the 
preceding section. 


Note - Certain predefined escape sequences, beginning with a \ 
(backslash), may be displayed in string literals, with the following 
exceptions: 
- A string in single quotation marks can span multiple lines, with 
each new line character escaped by a backslash. A string in double 
quotation marks cannot span multiple lines. 
- Escape sequences are processed literally inside a string that is parsed 
in the current locale (a localized string). 
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This chapter discusses tasks related to internationalizing with Xt and Xlib. 


Locale Management 101 
Font Management 109 
Drawing Localized Text 111 
Inputting Localized Text 111 
Interclient Communications Conventions for Localized Text 
123 

Messages 127 


Locale Management 


The following defines support for the locale mechanism that controls all 
locale-dependent Xlib and Common Desktop Environment functions. 


X Locale Management 


X locale supports one or more of the locales defined by the host environment. 
The Xlib conforms to the American National Standards I nstitute (ANSI) C 
library, and the locale announcement method is the set locale () function. 
This function configures the locale operation of both the host C library and 
Xlib. The operation of Xlib is governed by the LcC_CTYPE category; this is called 
the current locale. 


101 


102 


The xSupportsLocale() function is used to determine whether the current 
locale is Supported by X. 


Theclient is responsible for selecting its locale and X modifiers. Clients should 
provide a means for the user to override the clients’ locale selection at client 
invocation. Most single-display X clients operate in a single locale for both X 
and the host-processing environment. They configure the locale by calling 
three functions: setlocale(), XSupportsLocale(), and 
XSetLocaleModifiers(). 


The semantics of certain categories of X internationalization capabilities can 
be configured by setting modifiers. Modifiers are named by 
implementation-dependent and locale-specific strings. The only standard use 
for this capability at present is selecting one of several styles of keyboard input 
methods. 


The xXSetLocaleModifiers () function is used to configure Xlib locale 
modifiers for the current locale. 


The recommended procedure for clients initializing their locale and modifiers 
isto obtain locale and modifier announcers separately from one of the following 
prioritized sources: 


1. A command-line option 
2. A resource 
3. The empty string (“ ”) 


The first of these that is defined should be used. 


Note - When a locale command-line option or locale resource is defined, the 
effect should be to set all categories to the specified locale, overriding any 
category-specific settings in the local host environment. 


Locale and ModiferDependencies 


Theinternationalized Xlib functions operatein the current locale configured by 
the host environment and in the X locale modifiers set by the 
XSetLocaleModi fiers () function, or in the locale and modifiers configured 
at the time some object supplied to the function was created. For each 
locale-dependent function, Table 5-1 lists locale and modifier dependencies. 
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Table5-1 Locale and Modifier Dependencies 


Locale 
from... 


setlocale 


setlocale 


XrmDatabase 


setlocale 


setlocale 


setlocale 


XIM 


XIC 


setlocale 


XFontSet 


Affects the Function... 
Locale Query/Configuration 


XSupportsLocale 
XSetLocaleModifiers 


Resources 


XrmGetFileDatabase 
XrmGetStringDatabase 


XrmPutFileDatabase 
XrmLocaleOfDatabase 


Setting Standard Properties 


XmbSetWMProperties 


XmbTextPropertyToTextList 
XwcTextPropertyTo 
XmbTextListToTextProperty 


XwcTextListToTextProperty 


Text Input 


extList 


XOpenIM 


XCreateIC 


XLocaleOfIM, etc. 


XmbLookupText 


XwcLookupText 
Text Drawing 
XCreateFontSet 


XmbDrawText, 
XwcDrawText, etc. 
XExtentsOfFontSet, etc. 
XmbTextExtents, 
XweTextExtents, etc 


In the... 


Locale queried 
Locale modified 


Locale of XrmDatabase 


Locale of XrmDatabase 


Encoding of supplied 
returned text (Some WwM_ 
property text in 
environment locale) 


Encoding of 
supplied/returned text 


XIM input method 


XIC input method 
configuration 
Queried locale 


Keyboard layout 
Encoding of returned text 


Charsets of fonts in XFontSet 


Locale of supplied text 
Locale of supplied text 
Locale-dependent metrics 
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Table5-1 Locale and Modifier Dependencies (Continued) 


Locale Affects the Function... 
from... (Continued) (Continued) In the... 
Xlib Errors 
setlocale XGetErrorDatabaseText Locale of error message 


XGetErrorText 


Clients can assume that a locale-encoded text string returned by an X 
function can be passed to a C library function, or the string result of a C 
library function can be passed to an X function, if the locale is the same at 
the two calls. 


All text strings processed by internationalized Xlib functions are assumed 
to begin in the initial state of the encoding of the locale, if the encoding is 
state-dependent. All Xlib functions behave as if they do not change the 
current locale or X modifier setting. (This means that any function, 
provided within a library either by Xlib or by the application, that changes 
the locale or calls the xSetLocaleModifiers() function with a nonnull 
argument, must save and restore the current locale state on entry and 
exit.) Also, Xlib functions on implementations that conform to the ANSI C 
library do not alter the global state associated with the mblen(), 
mbtowc(),wctomb(), and strtok() ANSI C functions. 


XtLocale Management 


Xt locale management includes the following two functions: 


XtSetLanguageProc() 
® XtDisplayInitialize() 
XtSetLanguageProc 


Before the initialization of the Xt Toolkit, applications should normally call 
the xt Set LanguageProc() function with one of the following functions: 


XtSetLanguageProc (NULL, NULL, NULL) 
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Note - The locale is not actually set until the toolkit is initialized (for 
example, by way of the XtAppInitialize() function). Therefore, the 
setlocale() function may be needed after the xtSet LanguageProc () 
function and the initializing of the toolkit (for example, if calling the 
catopen() function). 


Resource databases are created in the current process locale. During 
display initialization prior to creating the per-screen resource database, the 
Intrinsics call to a specified application procedure to set the locale 
according to options found on the command line or in the per-display 
resource specifications. 


The callout procedure provided by the application is of type 
XtLanguageProc, as in the following syntax: 

typedef String (*XtLanguageProc) (displaylD, languagealD, clientdata) ; 
Display *displaylD; 

String language D; 

XtPointer cClientdata; 


display! D Passes the display. 


languagd D Passes the initial language value obtained from the 
command line or server per-display resource 
specifications. 


clientdata Passes the additional client data specified in the call 
to the xt SetLanguageProc() function. 


The language procedure allows an application to set the locale to the value 
of the language resource determined by the xtDisplayInitialize () 
function. The function returns a new language string that is subsequently 
used by the XtDisplayInitialize() function to establish the path for 
loading resource files. This string is cached and is the locale of the display. 


Initially, no language procedure is set by the intrinsics. To set the language 
procedure for use by the XtDisplayInitialize() function, use the 
Xt Set LanguageProc() function: 


XtLanguageProc XtSetLanguageProc (applicationcontext, procedure, clientdata) 
XtAppContext applicationcontext; 

XtLanguageProc procedure; 

XtPointer Cclientdata; 
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applicationcontext Specifies the application context in which the 
language procedure is to be used or specifies a null 


value. 
procedure Specifies the language procedure. 
clientdata Specifies additional client data to be passed to the 


language procedure when it is called. 


The xtSetLanguageProc() function sets the language procedure that is 
called from the xtDisplayInitialize() function for all subsequent 
displays initialized in the specified application context. If the 
applicationcontext parameter is null, the specified language procedure is 
registered in all application contexts created by the calling process, 
including any future application contexts that may be created. If the 
procedure parameter is null, a default language procedure is registered. 
The xtSetLanguageProc() function returns the previously registered 
language procedure. If a language procedure has not yet been registered, 
the return value is unspecified; but if this return value is used ina 
subsequent call to the xt Set LanguageProc () function, it causes the 
default language procedure to be registered. 


The default language procedure does the following: 


© Sets the locale according to the environment. On ANSI C-based systems, 
this is done by calling the setlocale (LC_ALL, “language”) function. If an 
error is encountered, a warning message is issued with the 
XtWarning() function. 


* Calls the XSupportsLocale() function to verify that the current locale 
is Supported. If the locale is not supported, a warning message is issued 
with the xtWarning() function and the locale is set to "C". 


® Calls the xSetLocaleModifiers() function specifying the empty 
string. 


© Returns the value of the current locale. On ANSI C-based systems, this 
is the return value from a final call to the setlocale (LcC_CTYPE, NULL) 
function. 


A client can use this mechanism to establish a locale by calling the 
Xt SetLanguageProc() function prior tothe xtDisplayInitialize() 
function, as in the following example. 
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Widget top; 
XtSetLanguageProc (NULL, NULL, NULL); 
top = XtAppInitialize( ... ); 


XtDispla ylnitialize 


The xtDisplayInitialize() function first determines the language 
string to be used for the specified display and loads the application's 
resource database for this display-host-application combination from the 
following sources in order of precedence: 


1. Application command line (argv) 
2. Per-host user environment resource file on the local host 


3. Resource property on the server or user-preference resource file on the 
local host 


4. Application-specific user resource file on the local host 
5. Application-specific class resource file on the local host 


The xtDisplayInitialize() function creates a unique resource 
database for each display parameter specified. When a database is created, 
a language string is determined for the display parameter in a manner 
equivalent to the following sequence of actions. 


The XtDisplayInitialize() function initially creates two temporary 
databases. The first database is constructed by parsing the command line. 
The second database is constructed from the string returned by the 
XResourceManagerString() function or, if the 
XResourceManagerString() function returns a null value, the contents 
of a resource file in the user’s home directory. The name for this 
user-preference resource file is SHOME/.Xdefaults. 


The database constructed from the command line is then queried for the 
resource namexnlLanguage, Cass class.xnlLanguage, where name and 
class are the specified application name and application class. If this 
database query is unsuccessful, the server resource database is queried; if 
this query is also unsuccessful, the language is determined from the 
environment. This is done by retrieving the value of the LANG environment 
variable. If no language string is found, the empty string is used. 
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The application-specific class resource file name is constructed from the 
class name of the application. It points to a localized resource file that is 
usually installed by the site manager when the application is installed. The 
file is found by calling the xtResolvePathname () function with the 
parameters (displaylD, applicationdefaults, NULL, NULL, NULL, NULL, 
0, NULL). This file should be provided by the developer of the application 
because it may be required for the application to function properly. A 
simple application that needs a minimal set of resources in the absence of 
its class resource file can declare fallback resource specifications with the 
XtAppSetFallbackResources () function. 


The application-specific user resource file name points to a user-specific 
resource file and is constructed from the class name of the application. This 
file is owned by the application and typically stores user customizations. | ts 
name is found by calling the xtResolvePathname() function with the 
parameters (displaylD, NULL, NULL, NULL, path, NULL, 0, NULL), 
where path is defined in an operating-system-specific manner. The path 
variable is defined to be the value of the xUSERFILESEARCHPATH 
environment variable if this is defined. Otherwise, the default is vendor- 
defined. 


If the resulting resource file exists, it is merged into the resource database. 
This file can be provided with the application or created by the user. 


The temporary database created from the server resource property or user 
resource file during language determination is then merged into the 
resource database. The server resource file is created entirely by the user 
and contains both display-independent and display-specific user 
preferences. 


If one exists, a user’s environment resource file is then loaded and merged 
into the resource database. This file name is user- and host-specific. The 
user's environment resource file name is constructed from the value of the 
user'S XENVIRONMENT environment variable for the full path of the file. If 
this environment variable does not exist, the xtDisplayInitialize() 
function searches the user’s home directory for the .xdefaults-host file, 
where host is the name of the machine on which the application is running. 
If the resulting resource file exists, it is merged into the resource database. 
The environment resource file is expected to contain process-specific 
resource specifications that are to supplement those user-preference 
specifications in the server resource file. 
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FontManagement 


International text drawing is done using a set of one or more fonts, as 
needed for the locale of the text. 


The two methods of internationalized drawing within the system 
environment allow clients to choose one of the static output widgets (for 
example, XmLabe1) or to choose the DrawingArea widget to draw with any 
other primitive function. 


Static output widgets require that text be converted to xmString. 


The following information explains the mechanism for managing fonts 
using the Xlib routines and functions. 


Creating and Freeing a FontSet 


Xlib international text drawing is done using a set of one or more fonts, as 
needed for the locale of the text. Fonts are loaded according to a list of base 
font names supplied by the client and the charsets required by the locale. 
The xFontSet is an opaque type. 


e The xCreateFontSet () function is used to create an international text 
drawing font set. 


e The xFontsOfFontSet () function is used to obtain a list of 
XFontStruct structures and full font names given an XF ontSet. 


© To obtain the base font name list and the selected font name list given 
an XFontSet, use the XBaseFontNameListOfFontSet () function. 


* To obtain the locale name given an xFontSet, use the 
XLocaleOfFontSet () function. 


e The XLocaleOfFontSet () function returns the name of the locale 
bound to the specified xFontSet as a null-terminated string. 


® The xFreeFontSet () function frees the specified font set. The 
associated base font name list, font name list, XFontStruct list, and 
XFontSetExtents, if any, are freed. 
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Obtaining FontSet Metrics 


Metrics for the internationalized text drawing functions are defined in 
terms of a primary draw direction, which is the default direction in which 
the character origin advances for each succeeding character in the string. 
The Xlib interface is currently defined to support only a left-to-right 
primary draw direction. The drawing origin is the position passed to the 
drawing function when the text is drawn. The basdineis a line drawn 
through the drawing origin parallel to the primary draw direction. 
Character ink is the pixels painted in the foreground color and does not 
include interline or intercharacter spacing or image text background pixels. 


The drawing functions are allowed to implement implicit text direction 
control, reversing the order in which characters are rendered along the 
primary draw direction in response to locale-specific lexical analysis of the 
string. 


Regardless of the character rendering order, the origins of all characters 
are on the primary draw direction side of the drawing origin. The screen 
location of a particular character image may be determined with the 

XmbTextPerCharExtents() Of XwcTextPerCharExtents () functions. 


The drawing functions are allowed to implement context-dependent 
rendering, where the glyphs drawn for a string are not simply a 
combination of the glyphs that represent each individual character. A 
string of two characters drawn with the xmbDrawString() function may 
render differently than if the two characters were drawn with separate 
calls to the XmbDrawString() function. If the client adds or inserts a 
character in a previously drawn string, the client may need to redraw some 
adjacent characters to obtain proper rendering. 


The drawing functions do not interpret newline characters, tabs, or other 
control characters. The behavior when nonprinting characters are drawn 
(other than spaces) is implementation-dependent. It is the client’s 
responsibility to interpret control characters in a text stream. 


To find out about context-dependent rendering, use the 
XContextDependentDrawing() function. The XExtentsOfFontSet () 
function obtains the maximum extents structure given an XF ontSet. The 
XmbTextEscapement () and XwcTextEscapement () functions obtain the 
escapement in pixels of the specified text as a value. The 
XmbTextExtents() and XwcTextExtents () functions obtain the overall 
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bounding box of the string’s image and a logical bounding box 
(overall_ink_return and overall_logical_return arguments respectively). The 
XmbTextPerCharExtents() and XwcTextPerCharExtents() functions 
return the text dimensions of each character of the specified text, using the 
fonts loaded for the specified font set. 


Drawing Localized Text 


The functions defined in this section draw text at a specified location in a 
drawable. They are similar to the xDrawText (), XDrawString(), and 
XDrawImageString() functions except that they work with font sets 
instead of single fonts, and they interpret the text based on the locale of the 
font set instead of treating the bytes of the string as direct font indexes. If 
a BadF ont error is generated, characters prior to the offending character 
may have been drawn. 


The text is drawn using the fonts loaded for the specified font set; the font 
in the graphics context (GC) is ignored and may be modified by the 
functions. No validation that all fonts conform to some width rule is 
performed. 


Use the XmbDrawText () Of XwcDrawText () function to draw text using 
multiple font sets in a given drawable. To draw text using a single font set 
in a given drawable, use the XmbDrawString() Of XwcDrawString() 
function. To draw image text using a single font set in a given drawable, 
use the XmbDrawImageString() Or XwcDrawImageString() function. 


Inputting Loc alized Text 


The following discusses the Xlib and desktop mechanisms used for 
international text input. If you are using Motif Text [Field] widgets or 
you are using the Xmlm APIs for text input, this section provides 
background information. However, it will not impact your application 
design or coding practice. If you are not interested in how character input 
is achieved from the keyboard with low-level Xlib calls, you can proceed to 
“Interclient Communications Conventions for Localized Text” on page 123. 
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Xlib Input Method Overview 


This section provides definitions for terms and concepts used for 
internationalized text input and a brief overview of the intended use of the 
mechanisms provided by Xlib. 


A large number of languages in the world use alphabets consisting of a 
small set of symbols (letters) to form words. To enter text into a computer 
in an alphabetic language, a user usually has a keyboard on which there 
are key symbols corresponding to the alphabet. Sometimes, a few 
characters of an alphabetic language are missing on the keyboard. Many 
computer users who speak a Latin-alphabet-based language only have an 
English-based keyboard. They need to press a combination of keystrokes to 
enter a character that does not exist directly on the keyboard. A number of 
algorithms have been developed for entering such characters, known as 
European input methods, the compose input method, or the dead-keys 
input method. 


J apanese is an example of a language with a phonetic symbol set, where 
each symbol represents a specific sound. There are two phonetic symbol 
sets in J] apanese: Katakana and Hiragana. In general, Katakana is used for 
words that are of foreign origin, and Hiragana for writing native J apanese 
words. Collectively, the two systems are called Kana. Hiragana consists of 
83 characters; Katakana, 86 characters. 


Korean also has a phonetic symbol set, called Hangul. Each of the 24 basic 
phonetic symbols (14 consonants and 10 vowels) represent a specific sound. 
A syllable is composed of two or three parts: the initial consonants, the 
vowels, and the optional last consonants. With Hangul, syllables can be 
treated as the basic units on which text processing is done. For example, a 
delete operation may work on a phonetic symbol or a syllable. Korean code 
sets include several thousands of these syllables. A user types the phonetic 
symbols that make up the syllables of the words to be entered. The display 
may change as each phonetic symbol is entered. For example, when the 
second phonetic symbol of a syllable is entered, the first phonetic symbol 
may change its shape and size. Likewise, when the third phonetic symbol is 
entered, the first two phonetic symbols may change their shape and size. 


Not all languages rely solely on alphabetic or phonetic systems. Some 

languages, including J apanese and Korean, employ an ideographic writing 
system. In an ideographic system, rather than taking a small set of symbols 
and combining them in different ways to create words, each word consists 
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of one unique symbol (or, occasionally, several symbols). The number of 
symbols may be very large: approximately 50,000 have been identified in 
Hanzi, the Chinese ideographic system. 


There are two major aspects of ideographic systems for their computer 
usage. First, the standard computer character sets in J apan, China, and 
Korea include roughly 8,000 characters, while sets in Taiwan have between 
15,000 and 30,000 characters, which make it necessary to use more than 
one byte to represent a character. Second, it is obviously impractical to 
have a keyboard that includes all of a given language's ideographic 
symbols. Therefore a mechanism is required for entering characters so that 
a keyboard with a reasonable number of keys can be used. Those input 
methods are usually based on phonetics, but there are also methods based 
on the graphical properties of characters. 


In J apan, both Kana and Kanji are used. In Korea, Hangul and sometimes 
Hanja are used. Now, consider entering ideographs in J apan, Korea, China, 
and Taiwan. 


In J apan, either Kana or English characters are entered and a region is 
selected (sometimes automatically) for conversion to Kanji. Several Kanji 
characters can have the same phonetic representation. If that is the case, 
with the string entered, a menu of characters is presented and the user 
must choose the appropriate option. If no choice is necessary or a 
preference has been established, the input method does the substitution 
directly. When Latin characters are converted to Kana or Kanji, it is called 
a Romaji conversion. 


In Korea, it is usually acceptable to keep Korean text in Hangul form, but 
some people may choose to write Hanja-originated words in Hanja rather 
than in Hangul. To change Hangul to Hanja, a region is selected for 
conversion and the user follows the same basic method as described for 

J apanese. 


Probably because there are well-accepted phonetic writing systems for 

J apanese and Korean, computer input methods in these countries for 
entering ideographs are fairly standard. Keyboard keys have both English 
characters and phonetic symbols engraved on them, and the user can 
switch between the two sets. 
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The situation is different for Chinese. While there is a phonetic system 
called Pinyin promoted by authorities, there is no consensus for entering 
Chinese text. Some vendors use a phonetic decomposition (Pinyin or 
another), others use ideographic decomposition of Chinese words, with 
various implementations and keyboard layouts. There are about 16 known 
methods, none of which is a clear standard. 


Also, there are actually two ideographic sets used: Traditional Chinese (the 
original written Chinese) and Simplified Chinese Several years ago, the 
People’s Republic of China launched a campaign to simplify some 
ideographic characters and eliminate redundancies altogether. Under the 
plan, characters would be streamlined every five years. Characters have 
been revised several times now, resulting in the smaller, simpler set that 
makes up Simplified Chinese. 


Input Method Arc hitec ture 


As shown in the previous section, there are many different input methods 
used today, each varying with language, culture, and history. A common 
feature of many input methods is that the user can type multiple 
keystrokes to compose a single character (or set of characters). The process 
of composing characters from keystrokes is called preediting. It may require 
complex algorithms and large dictionaries involving substantial computer 
resources. 


Input methods may require one or more areas in which to show the 
feedback of the actual keystrokes, to show ambiguities to the user, to list 
dictionaries, and so on. The following are the input method areas of 
concern. 


Status area Intended to be a logical extension of the 
light-emitting diodes (LEDs) that exist on the 
physical keyboard. It is a window that is intended to 
present the internal state of the input method that 
is critical to the user. The status area may consist of 
text data and bitmaps or some combination. 


Preedit area Intended to display the intermediate text for those 
languages that are composing prior to the client 
handling the data. 
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Auxiliary area Used for pop-up menus and customizing dialog boxes 
that may be required for an input method. There 
may be multiple auxiliary areas for any input 
method. Auxiliary areas are managed by the input 
method independent of the client. Auxiliary areas 
are assumed to be a separate dialog that is 
maintained by the input method. 


There are various user interaction styles used for preediting. The following 
are the preediting styles supported by Xlib. 


OnTheSpot Data is displayed directly in the application window. 
Application data is moved to allow preedit data to be 
displayed at the point of insertion. 


OverT heSpot Data is displayed in a preedit window that is placed 
over the point of insertion. 


OffT heSpot Preedit window is displayed inside the application 
window but not at the point of insertion. Often, this 
type of window is placed at the bottom of the 
application window. 


Root window Preedit window is the child of RootWindow. 


It would require a lot of computing resources if portable applications had to 
include input methods for all the languages in the world. To avoid this, a 
goal of the Xlib design is to allow an application to communicate with an 
input method placed in a separate process. Such a process is called an input 
server. The server to which the application should connect is dependent on 
the environment when the application is started up: what the user 
language is and the actual encoding to be used for it. The input method 
connection is said to be locale dependent. It is also user-dependent; for a 
given language, the user can choose, to some extent, the user-interface 
style of input method (if there are several choices). 


Using an input server implies communications overhead, but applications 
can be migrated without relinking. Input methods can be implemented 
either as a token communicating to an input server or as a local library. 


The abstraction used by a client to communicate with an input method is 
an opaque data structure represented by the XIM data type. This data 
structure is returned by the xopenIM() function, which opens an input 
method on a given display. Subsequent operations on this data structure 
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encapsulate all communication between client and input method. There is 
no need for an X client to use any networking library or natural language 
package to use an input method. 


A single input server can be used for one or more languages, supporting one 
or more encoding schemes. But the strings returned from an input method 
are always encoded in the (single) locale associated with the XIM object. 


Input Contexts 


Xlib provides the ability to manage a multithreaded state for text input. A 
client may be using multiple windows, each window with multiple text 
entry areas, with the user possibly switching among them at any time. The 
abstraction for representing the state of a particular input thread is called 
an input context. The Xlib representation of an input context is an XIC. See 
Figure 5-1 on page 117 for an illustration. 
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Window Window 


Figure5-1 Input method and input contexts 


An input context is the abstraction retaining the state, properties, and 
semantics of communication between a client and an input method. An 
input context is a combination of an input method, a locale specifying the 
encoding of the character strings to be returned, a client window, internal 
state information, and various layout or appearance characteristics. The 
input context concept somewhat matches for input the graphics context 
abstraction defined for graphics output. 
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One input context belongs to exactly one input method. Different input 
contexts can be associated with the same input method, possibly with the 
same client window. An XIC is created with the xcreateIc() function, 
providing an XIM argument, affiliating the input context to the input 
method for its lifetime. When an input method is closed with the 
XCloseIM() function, no affiliated input contexts should be used again 
(and should preferably be deleted before closing the input method). 


Considering the example of a client window with multiple text entry areas, 
the application programmer can choose to implement the following: 


¢ As many input contexts are created as text-entry areas. The client can 
get the input accumulated on each context each time it looks up that 
context. 


e A single context is created for a top-level window in the application. If 
such a window contains several text-entry areas, each time the user 
moves to another text-entry area, the client has to indicate changes in 
the context. 


Application designers can choose a range of single or multiple input 
contexts, according to the needs of their applications. 


Keyboard Input 


To obtain characters from an input method, a client must call the 
XmbLookupString() function or XwcLookupString() function with an 
input context created from that input method. Both a locale and display are 
bound to an input method when they are opened, and an input context 
inherits this locale and display. Any strings returned by the 
XmbLookupString() Of XwcLookupString () function are encoded in that 
locale. 


Xlib Focus Management 


For each text-entry area in which the XmbLookupString() or 
XwcLookupString() function is used, there is an associated input context. 


When the application focus moves to a text-entry area, the application must 
set the input context focus to the input context associated with that area. 
The input context focus is set by calling the XSet ICFocus () function with 
the appropriate input context. 
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Also, when the application focus moves out of a text-entry area, the 
application should unset the focus for the associated input context by 
calling the xXUnsetICFocus() function. As an optimization, if the 
XSetICFocus() function is called successively on two different input 
contexts, setting the focus on the second automatically unsets the focus on 
the first. 


Note - To set and unset the input context focus correctly, it is necessary to 
track application-level focus changes. Such focus changes do not necessarily 
correspond to X server focus changes. 


If a single input context is used to do input for multiple text-entry areas, it 
is also necessary to set the focus window of the input context whenever the 
focus window changes. 


Xlib Geometry Management 


In most input method architectures (OnTheSpot being the notable 
exception), the input method performs the display of its own data. To 
provide better visual locality, it is often desirable to have the input method 
areas embedded within a client. To do this, the client may need to allocate 
space for an input method. Xlib provides support that allows the client to 
provide the size and position of input method areas. The input method 
areas that are supported for geometry management are the status area and 
the preedit area. 


The fundamental concept on which geometry management for input method 
windows is based is the proper division of responsibilities between the 
client (or toolkit) and the input method. The division of responsibilities is 
the following: 


© The client is responsible for the geometry of the input method window. 


¢ The input method is responsible for the contents of the input method 
window. It is also responsible for creating the input method window per 
the geometry constraints given to it by the client. 


An input method can suggest a size to the client, but it cannot suggest a 
placement. The input method can only suggest a size: it does not determine 
the size, and it must accept the size it is given. 
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Before a client provides geometry management for an input method, it 
must determine if geometry management is needed. The input method 
indicates the need for geometry management by setting the 
XIMPreeditArea() OF XIMStatusArea() function in its XIMStyles value 
returned by the xGet IMValues () function. When a client decides to 
provide geometry management for an input method, it indicates that 
decision by setting the xNInput Style value in the XIC. 


After a client has established with the input method that it will do 
geometry management, the client must negotiate the geometry with the 
input method. The geometry is negotiated by the following steps: 


© The client suggests an area to the input method by setting the 
XNAreaNeeded value for that area. If the client has no constraints for 
the input method, it either does not suggest an area or sets the width 
and height to 0 (zero). Otherwise, it sets one of the values. 


°¢ The client gets the XIC XxNAreaNeeded value. The input method returns 
its suggested size in this value. The input method should pay attention 
to any constraints suggested by the client. 


* The client sets the XIC xNArea value to inform the input method of the 
geometry of the input method’s window. The client should try to honor 
the geometry requested by the input method. The input method must 
accept this geometry. 


Clients performing geometry management must be aware that setting other 
IC values may affect the geometry desired by an input method. For 
example, the XNFontSet and XNLineSpacing values may change the 
geometry desired by the input method. It is the responsibility of the client 
to renegotiate the geometry of the input method window when it is needed. 


In addition, a geometry management callback is provided by which an 
input method can initiate a geometry change. 


Event Filtering 


A filtering mechanism is provided to allow input methods to capture X 

events transparently to clients. It is expected that toolkits (or clients) using 
the xmbLookupString() OF XwcLookupString() function call this filter 
at some point in the event processing mechanism to make sure that events 
needed by an input method can be filtered by that input method. If there is 
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Callbacks 


no filter, a client can receive and discard events that are necessary for the 
proper functioning of an input method. The following provides a few 
examples of such events: 


© Expose events that are on a preedit window in local mode. 


© Events can be used by an input method to communicate with an input 
server. Such input server protocol-related events have to be intercepted 
if the user does not want to disturb client code. 


© Key events can be sent to a filter before they are bound to translations 
such as Xt provides. 


Clients are expected to get the XIC XNFilterE vents value and add to the 
event mask for the client window with that event mask. This mask can be 
0. 


When an OnTheSpot input method is implemented, only the client can 
insert or delete preedit data in place and possibly scroll existing text. This 
means the echo of the keystrokes has to be achieved by the client itself, 
tightly coupled with the input method logic. 


When a keystroke is entered, the client calls the xmbLookupString() or 
XwcLookupString() function. At this point, in the OnTheSpot case, the 
echo of the keystroke in the preedit has not yet been done. Before returning 
to the client logic that handles the input characters, the lookup function 
must call the echoing logic for inserting the new keystroke. If the 
keystrokes entered so far make up a character, the keystrokes entered need 
to be deleted, and the composed character is returned. The result is that, 
while being called by client code, input method logic has to call back to the 
client before it returns. The client code, that is, a callback routine, is called 
from the input method logic. 


There are a number of cases where the input method logic has to call back 
the client. Each of those cases is associated with a well-defined callback 
action. It is possible for the client to specify, for each input context, which 
callback is to be called for each action. 


There are also callbacks provided for feedback of status information and a 
callback to initiate a geometry request for an input method. 
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X ServerKeyboard Protocol 


This section discusses the server and keyboard groups. 


A keysym is the encoding of a symbol on a keycap. The goal of the server’s 
keysym mapping is to reflect the actual key caps on the physical keyboards. 
The user can redefine the keyboard by running the xmodmap command with 
the new mapping desired. 


X Version 11 Release 4 (X11R4) allows for definition of a bilingual keyboard 
at the server. The following describes this capability. 


A list of keysyms is associated with each key code. The following list 
discusses the set of symbols on the corresponding key: 


© If the list (ignoring trailing NoSymbol entries) is a single keysym x, the 
list is treated as if it were the list K NoSymbol K NoSymbol. 


e If the list (ignoring trailing NoSymbol entries) is a pair of keysyms K1 
K2, the list is treated as if it were thelist Kl K2 Kl k2. 


© If the list (ignoring trailing NoSymbol entries) is three keysyms K1 K2 
K3, the list is treated as if it were the list Kl K2 K3 NoSymbol. 


When an explicit void element is desired in the list, the VoidSymbol value 
can be used. 


The first four elements of the list are split into two groups of keysyms. 
Group 1 contains the first and second keysyms; Group 2 contains the third 
and fourth keysyms. Within each group, if the second element of the group 
is NoSymbol, the group is treated as if the second element were the same 
as the first element, except when the first element is an alphabetic keysym 
K for which both lowercase and uppercase forms are defined. In that case, 
the group is treated as if the first element is the lowercase form of K and 
the second element is the uppercase form of K. 


The standard rules for obtaining a keysym from an event make use of the 
Group 1 and Group 2 keysyms only; no interpretation of other keysyms in 
the list is given here. The modifier state determines which group to use. 
Switching between groups is controlled by the keysym named MODE 
SWITCH by attaching that keysym to some key code and attaching that 
key code to any one of the modifiers Mod1 through Mod5. This modifier is 
called the group modifier. For any key code, Group 1 is used when the 
group modifier is off, and Group 2 is used when the group modifier is on. 
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Within a group, the keysym to use is also determined by the modifier state. 
The first keysym is used when the Shift and Lock modifiers are off. The 
second keysym is used when the Shift modifier is on, when the Lock 
modifier is on, and when the second keysym is uppercase alphabetic, or 
when the Lock modifier is on and is interpreted as ShiftLock. Otherwise, 
when the Lock modifier is on and is interpreted as CapsLock, the state of 
the Shift modifier is applied first to select a keysym; if that keysym is 
lowercase alphabetic, the corresponding uppercase keysym is used instead. 


No spatial geometry of the symbols on the key is defined by their order in 
the keysym list, although a geometry might be defined on a vendor-specific 
basis. The server does not use the mapping between key codes and 
keysyms. Rather, it stores it merely for reading and writing by clients. 


The KeyMask modifier named Lock is intended to be mapped to either a 
CapsLock or a ShiftLock key, but which one it is mapped to is left as an 
application-specific decision, user-specific decision, or both. However, it is 
suggested that users determine mapping according to the associated 
keysyms of the corresponding key code. 


Interc lientC ommunic ations Conventions forLocalized Text 


The following information explains how components use I nterclient 
Communications Conventions (ICCC) to communicate text data and is 
offered as a guideline to understand how ICCC selections are performed. 
The XmText widget, XmTextField widget, and the dtterm command 
adhere to these guidelines. 


The toolkit is enhanced for internationalized |CCC compliance. The 
selection mechanism of xmText, XmTextField, and dtterm is enhanced to 
ensure proper matching of data and data encoding in any selection 
transaction. This includes standard cut-and-paste operations. 


For developers who use the toolkit to write their applications, the toolkit 
enables the application to be |CCC-compliant. However, for developers who 
may use another non-|CCC-compliant toolkit to develop applications that 
communicate with toolkit-based applications, the following may be helpful. 
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Ownerof Selection 


Any owner returns at least the following atom list when xA_TARGETS is 
requested on some localized text: 


e Atom code set of current locale 


© COMPOUND_TEXT 


° XA_STRING 


When XA_TEXT is requested, the owner returns its text as is with the 
encoding type of the property set to the code set of the current locale (no 
data conversion). An atom is created, representing the name of the code set 
of the locale. 


When COMPOUND_TEXT is requested, the owner converts its localized text to 
compound text and passes it with the property type of COMPOUND_TEXT. 


When XA_STRING is requested, the owner attempts to convert the localized 
text to XA_STRING. If the text string contains characters that cannot be 
converted to XA_STRING, the operation is unsuccessful. 


Note - XA_STRING is defined to be |SO8859-1. 


Requesterof Selection 


A requester first requests XA_TARGET when text data is to be 
communicated with the selection owner. 


The requester then searches for one of the following atoms in priority order: 
¢ Atom for the code set of the requester’s locale 


© COMPOUND_TEXT 


° XA_STRING 


© XA_TEXT 


If the code set of the requester’s locale matches one of the targets, the 
requester makes a request using the atom representing that code set. The 
XA_TEXT atom is used only if none of the other atoms is found. Because the 
owner returns a property with a type representing its encoding, the 
requester attempts to convert to the code set of its locale. 
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If the type COMPOUND_TEXT or XA_STRING is requested, the requester 
attempts to convert the text property to the code set of its current locale by 
using the XmbTextPropertyToTextList() or 
XwcTextPropertyToTextList () functions. These are used when the 
owner client and requester client are running under different code sets. 


When converting from COMPOUND_TEXT Or XA_STRING, not all text data is 
guaranteed to be converted; only those characters that are in common 
between the owner and the requester will be converted. 


XmClipboard 


XmClipboard is also enhanced to be |CCC-compliant in conjunction with 
the XmText and XmTextField widgets. When text is being put on the 
clipboard by way of the xmText and XmTextField widgets, the following 
ICCC protocol is implemented: 


When text is being retrieved from the clipboard by way of the xmText and 
XmTextField widgets, the text from the clipboard is converted to encoding 
of the current locale from either COMPOUND_TEXT or XA_STRING. All text on 
the clipboard is assumed to be in either the compound text format or the 
string format. 


Note - If text is put directly on the clipboard, the application needs to 
specify the format, or encoding type in the form of an atom, along with the 
text to put on the clipboard. Similarly, if text is retrieved directly from the 
clipboard, the retrieving application needs to check the format to see what 
encoding the data on the clipboard is encoded in and take the appropriate 
action. 


Passing Window Title and Icon Name to Window Managers 


The default of the xtNtitleEncoding and xtNiconNameEncoding 
resources for the VendorShell class is set to None. This is done only when 
using the libXm.a library. The libXt.a library still retains XA_STRING as 
the default for the resources. 
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This is done so that, as a default case, the XmNt itle and XmNiconName 
resources are converted to a standard ICCC interchange, such as compound 
text, based on the assumption the text (title and icon name) is localized 
text. 


It is recommended that the user not set the xtNtitleEncoding and 
XtNiconNameEncoding resources. Instead, ensure that the xtNtitle and 
XtNiconName resources are strings encoded in the encoding of the 
currently active locale of the running client. If the None value is used, the 
toolkit converts the localized text to the standard ICCC style. (The 
encoding communicated is COMPOUND_TEXT or XA_STRING.) If the 
XtNtitleEncoding and XtNiconNameEncoding resources are set, the 
XtNtitle and xtNiconName resources are not converted in any way and 
are communicated to the Window Manager with the encoding specified. 


Assuming the Window Manager being communicated with is 
|1CCC-compliant, that Window Manager is able to use the encoding type of 
COMPOUND_TEXT or XA_STRING, or both. 


When setting the xmNdialogTitle resource of the xmBulletinBoard 
widget class, remember that there is a restriction on the charset segment. 
For charsets that are not X Consortium-standard compound text encodings 
or XmFONTLIST_DEFAULT_TAG-associated, the text segment is treated as 
localized text. Localized text is converted to either compound text or 
1S08859-1 before being communicated to the Window Manager. 


The Window Manager is enhanced so that it always converts the client title 
and icon name passed from clients to the encoding of its current locale, and 
an XmString is created using the XmFONTLIST_DEFAULT_TAG identifier. 
Thus, the client title and icon name are always drawn with the default font 
list entry of the Window Manager font list. 


Note - This allows clients running with different code sets but with similar 
character sets to communicate their titles to the Window Manager. F or 
example, both a PC code client and an |SO8859-1 client can display their 
titles regardless of the code set of the Window Manager. 
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Messages 


Part of internationalizing a system environment toolkit-based application is 
not to have any locale-specific data hardcoded within the application 
source. One common locale-specific item is messages (error and warning) 
returned by the application of the standard I/O (input/output). 


In general, for any error or warning messages to be displayed to the user 
through a system environment toolkit widget or gadget, the messages need 
to be externalized through message catalogs. 


For dialog messages to be displayed through a toolkit component, the 
messages need to be externalized through localized resource files. This is 
done in the same way as localizing resources, such as the XmLabel and 
XmPushbutton Classes’ XmNlabelString resource or window titles. 


For example, if a warning message is to be displayed through an 
XmMessageBox widget class, the xmNmessageSt ring resource cannot be 
hardcoded within the application source code. Instead, the value of this 
resource needs to be retrieved from a message catalog. For an 
internationalized application expected to run in different locales, a distinct 
localized catalog must exist for each of the locales to be supported. In this 
way, the application need not be rebuilt. 


The localized resource files can be put in the /opt /dt /app-defaults/%L 
subdirectories or they can be pointed to by the XENVIRONMENT environment 
variable. The %L variable indicates the locale used at run time. 


The preceding two choices are left as design decisions for the application 
devel oper. 
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Refer to the information in this appendix to write messages that are easily 
internationlized. 


File-Naming Conventions 129 
Cause and Recovery Information 130 
Comment Lines for Translators 130 
Writing Style 131 
Usage Statements 133 
Regular Expression Standard Messages 136 
Sample Messages 137 


File-Naming Conventions 


The conventions used in naming files with user messages are discussed 
here. Usually, the message source file has the suffix .msg; the generated 
message catalog has the suffix . cat. There may be other such files related 
to messages. The following criteria must be met for a file to have these 
suffixes: 


© |t is X/Open-compliant. 
¢ |t becomes a *.cat file through the use of the gencat command. 
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Cause and Recovery Information 


Whenever possible, explain to users exactly what has happened and what 
they can do to remedy the situation. 


The message Bad arg isnot very helpful. However, the following message 
tells users exactly what to do to make the command work: 


Do not specify more than 2 files on the command line 


Similarly, the message Line too long does not giver users recovery 
information. However, the following message gives users more specific 
recovery information: 


Line cannot exceed 20 characters 


If detailed recovery information is necessary for a given error message, add 
it to the appropriate place in online information or help. 


See “Sample Messages” on page 137 for samples of original and rewritten 
messages. 


CommentLines for Translators 


A message source file should contain comments to help the translator in the 
process of translation. These comments will not be part of the message 
catalog generated. The comments are similar to C language comments to 
help document a program. A dollar sign ($) followed by a space will be 
interpreted by the translation tool and the gencat command as comments. 
The following is an example of a comment line in a message source file. 


S$ This is a comment 


Use comment lines to tell translators and writers what variables, such as 
%s, YC, and %d, represent. For example, note whether the variable refers to 
such things as a user, file, directory, or flag. 


Place the comment line directly beneath the message to which it refers, 
rather than at the bottom of the message catalog. Global comments for an 
entire set can be placed directly below the $set directive in the source file. 


Specify in a comment line any messages within the message catalog that 
are obsolete. 
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Programming Format 


Witting Style 


For the programming format of messages, see the following list. 


® Donot construct messages from clauses. Use flags or other means within 
the program to pass information so that a complete message can be 
issued at the proper time. 


® Do not use hardcoded English text as a variable for a %s string in an 
existing message. This is also the construction of messages and is not 
translatable. 


® Capitalize the first word of the sentence, and use a period at the end of 
the sentence or phrase. 


© End the last line of the message with \n (backslash followed by a 
lowercase n, indicating a new line). This also applies to one-line 
messages. 


® Begin the second and remaining lines of a message with \t (backslash 
followed by a lowercase t, indicating a tab). 


® End all other lines with \n\ (backslash followed by a lowercase n, 
followed by another backslash, indicating a new line). 


¢ |f, for some reason, the message should not end with a new line, use a 
comment to tell the writers. 


® Precede each message with the name of the command that called the 
message, followed by a colon. The command name should precede the 
component number in error messages. The command name is shown in 
the following example as it should appear in a message: 


OPIE “foo: Opening the file.” 


The following guidelines on the writing style of messages include 
terminology, punctuation, mood, voice, tense, capitalization, and other 
usage questions. 


* Use sentence format. One-line and one-sentence messages are 
preferable. 


© Add articles (a, an, the) when necessary to eliminate ambiguity. 


® Capitalize the first word of the sentence and use a period at the end. 
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Use the present tense. Do not allow future tense in a message. For 
example, use the sentence: 


[The foo command displays a calendar. 


Instead of: 


The foo command will display a calendar. 
Do not use the first person (I or we) anywhere in messages. 


Avoid using the second person. 


Do not use the word you except in help and interactive text. 


Use active voice. The first line is the original message. The second line is 
the preferred wording. 


MYNUM “Month and year must be entered as numbers.” 


MYNUM “foo: 7777-222 Enter month and year as numbers. \n” 


7777-222 is the message ID. 


Use the imperative mood (command phrase) and active verbs: specify, 
use, check, choose, and wait are examples. 


State messages in a positive tone. The first line is the original message. 
The second line is the preferred wording. 


BADL “Don’t use the f option more than once.” 


BADL “foo: 7777-009 Use th f flag only once.\n” 


Do not use nouns as verbs. Use words only in the grammatical categories 
shown in the dictionary. If a word is shown only as a noun, do not use it 
as a verb. For example, do not solution a problem (or, for that matter, 
architect a system). 


Do not use prefixes or suffixes. Translators may not understand words 
beginning with re, un-, in-, or non-, and the translations of messages 
that use these prefixes or suffixes may not have the meaning you 
intended. Exceptions to this rule occur when the prefix is an integral 
part of a commonly used word. The words previous and premature are 
acceptable; the word nonexistent, is not. 
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Usage Statements 


* Donot use plurals. Do not use parentheses to show singular or plural, as 
in error(s), which cannot be translated. If you must show singular and 
plural, write error or errors. A better way is to condition the code so that 
two different messages are issued depending on whether the singular or 
plural of a word is required. 


® Do not use contractions. Use the single word cannot to denote something 
the system is unable to do. 


® Do not use quotation marks. This includes both single and double 
quotation marks. For example, do not use quotation marks around 
variables such as %s, %c, and %d or around commands. Users may take 
the quotation marks literally. 


* Do not hyphenate words at the end of lines. 


* Do not use the standard highlighting guidelines in messages, and do not 
substitute initial or all caps for other highlighting practices. 


* Do not use and/ or. This construction does not exist in other languages. 
Usually it is better to say or to indicate that it is not necessary to do 
both. 


e Use the 24-hour clock. Do not use a.m. or p.m. to specify time. F or 
example, write 1:00 p.m. as 1300. 


e Avoid acronyms. Only use acronyms that are better known to your 
audience than their spelled-out versions. To make a plural of an 
acronym, add a lowercase s, without an apostrophe. Verify that it is not 
a trademark before using it. 


© Avoid the “no-no” words. Examples are abort, argument, and execute. See 
the project glossary. 


© Retain meaningful terminology. Keep as much of the original message 
text as possible while ensuring that the message is meaningful and 
translatable. 


The usage statement is generated by commands when at least one flag that 
is not valid has been included in the command line. The usage statement 
must not be used if only the data associated with a flag is missing or 
incorrect. If this occurs, an error message unique to the problem is used. 
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Show the command syntax in the usage statement. For example, a 
possible usage statement for the del command reads: 


Usage: del {File ...|-} 
Clauses defining the purpose of a command are to be removed. 


Capitalize the first letter of such words (parameters) as File, Directory, 
String, Number, and so on only when used in a usage statement. 


Do not abbreviate parameters on the command line. It may be perfectly 
obvious to experienced users that Num means Numbe,, but spell it out to 
ensure correct translation. 


Use only the following delimiters in usage statements: 


Delimiter Description 


[] 
{} 


Parameter is optional. 


There is more than one parameter choice, but one of 
the parameters is required. (See the following text.) 


Choose one parameter only. [a] b] indicates that you 
can choose a or b or neither a nor b. {a] b}indicates 
that you must choose either a or b. 


Parameter can be repeated on the command line. 
(Note that there is a space before the ellipsis.) 


Standard input. 


A usage statement parameter does not require square brackets or braces 
if it is required and is the only choice, as in the following: 


banner String 


In usage statements, put a space between flags that must be separated 
on the command line. For example: 


unget [-n] [-rSID] [-s] {File|-} 

If flags can be used together without a separating space, do not separate 
them with a space on the command line. For example: 

we [-cwl] {File ...[-} 

When the order of flags on the command line does not make a difference, 


put them in alphabetical order. If the case is mixed, put lowercase 
versions first: 


get -aAijlmM 
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Standard Messages 


Some usage statements can be long and involved. Use your best 
judgment to determine where you should end lines in the usage 
statement. The following example shows an old-style usage statement for 
the get command: 


Usage: get [-e|-k] [-cCutoff] [-iList] [-rSID] [-wString] [xList] 
[-b] [-gmnpst] [-l[p]] File ... 

Retrieves a specified version of a Source Code Control System 
(SCCS) file. 


Certain commands have standard errors defined in POSIX.2 
documentation. Follow the guidelines set up in POSIX.2, if applicable. 


Tell the user toPress the ------ key to select a key on the 
keyboard, including the specific key to press (Such as, Press Ctrl-D). 


Unless the system is overloaded, there is no need to tell the user to Try 
again later. That should be obvious from the message. 


When writing message text, use the word parameter to describe text on 
the command line; use the word value to indicate numeric data. 


Use the word flag rather than the words command option. 
Do not use commas to set off the one-thousandth place in values. 
Do not use 1,000. Use 1000. 


If a message must be set off with an asterisk, use two asterisks at the 
beginning of the message and two asterisks at the end of the message. 


**x Total ** 


Use log in and log off as verbs. 


Log in to the system; enter the data; then log off. 


Use user name, group name, and login as nouns. 


The user name is sam. 
The group name is staff. 
The login directory is /u/sam. 


User number and group number refer to the number associated with the 
user’s name and group. 


Do not use the term superuser. The root user may not have all privileges. 
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e Use the words command string to describe the command with its 
parameters. 


© Many of the same messages occur frequently. Table A-1 lists the new 
standard message that replaces the old message. 


Table A-1 New Standard Messages 


Instead of These 


Use the Following Standard Messages Messages 


Cannot find or open the file. Can’t open filename. 
Cannot find or access the file. Can’t access 


The syntax of a parameter is not valid. syntax error 


Regular Expression Standard Messages 


Table A-2 lists the standard regular expression error messages, including 
the message number associated with each regular expression error: 


Table A-2 Regular Expression Standard Messages 


Number Use These Standard Messages Instead of These Messages 

11 Specify a range end point Range end point too 
that is less than 256. large. 

16 The character or characters Bad number. 
between \{ and \} must be 
numeric. 

25 Specify a \digit between 1 \digit out of range. 
and 9 that is not greater 
than the number of 
subpatterns. 

36 A delimiter is not correct Illegal or missing 
or is missing. delimiter. 

41 There is no remembered No remembered search 
search string. string. 

42 There is a missing \( or \). \(\) imbalance. 

43 Do not use \( more than 9 Too many \(. 


times. 
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Sample Messages 


Table A-2 Regular Expression Standard Messages (Continued) 


44 Do not specify more than 2 
numbers between \{ and \}. 


45 An opening \{ must have a 
closing \}. 


46 The first number cannot 
exceed the second number 
between \{ and \}. 


48 Specify a valid end point to 
the range. 

49 For each [ there must be a 
22 

50 The regular expression is 


too large for internal 
memory storage. Simplify 
the regular expression. 


More than two numbers 
given in \{ and \}. 


} expected after \. 


First number exceeds 
second in \{ and \}. 


Invalid end point in 
range expression. 


[ ] imbalance. 


Regular expression 
overflow 


These are examples of original messages and rewritten messages. The 
rewritten message follows each original message. 


AFLGKEYLTRS “Too Many -a Keyletters (Ad9)” 


AFLGKEYLTRS “foo: 7777-007 Use th 


FLGTWICE “Flag %c Twice (Ad4)” 
WICE “foo: 7777-004 Use th 


Hy 
a 


ESTAT “can’t access %s.\n” 


a flag less than 11 times.\n” 


Sc header flag once.\n” 
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d flag.\n” 


ESTAT “foo: 7777-031 Cannot find or access %s.\n” 

EMODE “foo: invalid mode\n” 

EMODE “foo: 7777-033 A mode flag or value is not correct.\n” 
DNORG “-d has no argument (adl1l)” 

DNORG “foo: 7777-001 Specify a parameter after th 

FLOORRNG “floor out of range (ad23)” 

FLOORRNG “foo: 7777-021 Specify a floor value greater than 0\n\ 
\tand less than 10000.\n™ 

AFLGARG “bad -a argument (ad8)” 

AFLGARG “foo: 7777-006 Specify a user name, group name, or\n\ 
\tgroup number after the -a flag.\n™“ 
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BAD 
BAD 


LIS 


LIS 


FM 


FM 


“bad list format 


“foo: 


\tnumbers.\n” 


(ad27)” 


7777-025 Us 


numeric version and release\ 
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